Embedding Space for Images with Multiple Text Labels
First Claim
1. A method implemented by a computing device to annotate individual images with multiple text labels to describe content of the images, the method comprising:
- processing a training image having multiple text labels to generate a set of image regions that correspond to the respective multiple text labels;
embedding, within an embedding space that is configured to embed both text labels and image regions mapped to the text labels, the set of image regions based, in part, on positions at which the multiple text labels that correspond to the image regions of the training image are embedded in the embedding space;
learning a mapping function that maps image regions to the text labels embedded in the embedding space, said learning based, in part, on said embedding the set of image regions within the embedding space;
discovering text labels that correspond to image regions of a query image by mapping the image regions of the query image to the embedding space using the learned mapping function; and
annotating the query image with at least two of the discovered text labels.
2 Assignments
0 Petitions
Accused Products
Abstract
Embedding space for images with multiple text labels is described. In the embedding space both text labels and image regions are embedded. The text labels embedded describe semantic concepts that can be exhibited in image content. The embedding space is trained to semantically relate the embedded text labels so that labels like “sun” and “sunset” are more closely related than “sun” and “bird”. Training the embedding space also includes mapping representative images, having image content which exemplifies the semantic concepts, to respective text labels. Unlike conventional techniques that embed an entire training image into the embedding space for each text label associated with the training image, the techniques described herein process a training image to generate regions that correspond to the multiple text labels. The regions of the training image are then embedded into the training space in a manner that maps the regions to the corresponding text labels.
-
Citations
20 Claims
-
1. A method implemented by a computing device to annotate individual images with multiple text labels to describe content of the images, the method comprising:
-
processing a training image having multiple text labels to generate a set of image regions that correspond to the respective multiple text labels; embedding, within an embedding space that is configured to embed both text labels and image regions mapped to the text labels, the set of image regions based, in part, on positions at which the multiple text labels that correspond to the image regions of the training image are embedded in the embedding space; learning a mapping function that maps image regions to the text labels embedded in the embedding space, said learning based, in part, on said embedding the set of image regions within the embedding space; discovering text labels that correspond to image regions of a query image by mapping the image regions of the query image to the embedding space using the learned mapping function; and annotating the query image with at least two of the discovered text labels. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system to annotate individual images with multiple text labels to describe content of the images, the system comprising:
-
one or more processors; and computer-readable storage media having stored thereon instructions that are executable by the one or more processors to perform operations comprising; training an embedding space in which both images and text labels are embedded, said training semantically relating text labels configured to describe semantic concepts exhibited in image content and mapping representative images that have image content which exemplifies the semantic concepts to respective text labels; learning a mapping function based on the training that maps image regions to the text labels embedded in the embedding space; obtaining an image to annotate; determining a set of regions of the image using at least one region proposal technique which determines image regions capable of being mapped to corresponding text labels embedded in the embedding space; mapping the set of regions of the image to corresponding text labels in the embedding space according to the mapping function, the corresponding text labels describing semantic concepts exhibited in image content of the set of regions of the image; and annotating the image with at least two of the corresponding text labels. - View Dependent Claims (16, 17)
-
-
18. A method implemented by a computing device to annotate individual images with multiple text labels to describe content of the images, the method comprising:
-
training an embedding space in which both images and text labels are embedded, said training semantically relating text labels configured to describe semantic concepts exhibited in image content and mapping representative images that have image content which exemplifies the semantic concepts to respective text labels; discovering at least two text labels in the trained embedding space that describe image content of an input image, the at least two text labels discovered describing the image content of at least two respective regions of the input image; and associating the at least two text labels with the input image. - View Dependent Claims (19, 20)
-
Specification