Topic association and tagging for dense images
First Claim
1. A computer system comprising:
- one or more processors; and
one or more computer storage media storing computer-useable instructions that, when used by the one or more processors, cause the one or more processors to;
receive a plurality of images, each image of the plurality of images being associated with a plurality of tags and each image of the plurality of images being comprised of a plurality of regions, each region of each image comprising less than an entirety of the image it comprises;
for each region of each image of the plurality of images, generate an image feature vector from one or more visual features;
for each image of the plurality of images, generate a weighted word vector from the associated plurality of tags;
for each image, compute a heat map corresponding thereto by aligning the image feature vector for each region of a given image and the weighted word feature vector into a common embedding space utilizing cosine similarity loss, wherein a plurality of regions of the heat map corresponds to the plurality of regions of the given image and wherein at least one region of the plurality of regions of the heat map corresponds to each of the plurality of tags; and
provide an image of the plurality of images to be presented based on the computed heat map for the image.
2 Assignments
0 Petitions
Accused Products
Abstract
A framework is provided for associating dense images with topics. The framework is trained utilizing images, each having multiple regions, multiple visual characteristics and multiple keyword tags associated therewith. For each region of each image, visual features are computed from the visual characteristics utilizing a convolutional neural network, and an image feature vector is generated from the visual features. The keyword tags are utilized to generate a weighted word vector for each image by calculating a weighted average of word vector representations representing keyword tags associated with the image. The image feature vector and the weighted word vector are aligned in a common embedding space and a heat map is computed for the image. Once trained, the framework can be utilized to automatically tag images and rank the relevance of images with respect to queried keywords based upon associated heat maps.
-
Citations
19 Claims
-
1. A computer system comprising:
-
one or more processors; and one or more computer storage media storing computer-useable instructions that, when used by the one or more processors, cause the one or more processors to; receive a plurality of images, each image of the plurality of images being associated with a plurality of tags and each image of the plurality of images being comprised of a plurality of regions, each region of each image comprising less than an entirety of the image it comprises; for each region of each image of the plurality of images, generate an image feature vector from one or more visual features; for each image of the plurality of images, generate a weighted word vector from the associated plurality of tags; for each image, compute a heat map corresponding thereto by aligning the image feature vector for each region of a given image and the weighted word feature vector into a common embedding space utilizing cosine similarity loss, wherein a plurality of regions of the heat map corresponds to the plurality of regions of the given image and wherein at least one region of the plurality of regions of the heat map corresponds to each of the plurality of tags; and provide an image of the plurality of images to be presented based on the computed heat map for the image. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented method for tagging images, the method comprising:
-
receiving an image associated with a plurality of user-provided tags, the image being comprised of a plurality of regions, each region comprising less than the entire image; generating an embedded image feature vector for each of the plurality of regions; generating an image-specific weighted word vector from the plurality of user-provided tags; computing a first heat map corresponding to the image by aligning the embedded image feature vector for each region of the image and the image-specific weighted word vector into a common embedding space using cosine similarity loss, wherein a plurality of regions of the first heat map corresponds to the plurality of regions of the image, and wherein at least one region of the first heat map corresponds to each of the plurality of user-provided tags; and providing the image to be presented based on the computed first heat map. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A computing system comprising:
-
means for generating an image embedding vector for each of the plurality of regions of an image utilizing a convolutional neural network; means for generating a soft topic feature vector for the image by calculating a weighted average of a plurality of word vector representations, each of the plurality of word vector representations being generated for a different one of a plurality of tags associated with the image; and means for computing a heat map corresponding to the image by aligning the image embedding vector for each region of the image and the soft topic feature vector into a common embedding space utilizing cosine similarity loss, wherein a plurality of regions of the heat map corresponds to the plurality of regions of the image, and wherein at least one region of the heat map corresponds to each of the plurality of tags associated with the image, and wherein the image is provided for presentation based on the computed heat map.
-
Specification