Refining image relevance models
First Claim
1. A method comprising:
- receiving a trained image relevance model that generates relevance measures of images to a query, wherein the trained image relevance model has been trained based on content feature values of a set of training images, the query being a unique set of one or more query terms received by a search system as a query input; and
re-training the image relevance model, the re-training comprising;
generating a first re-trained image relevance model based on content feature values of first images of a first portion of training images in the set of training images;
receiving, from the first re-trained image relevance model, image relevance scores for second images of a second portion of the set of training images;
removing, from the set of training images, at least some of the second images of the second portion of the set training images identified as outlier images, the outlier images being training images for which the image relevance score received from the first re-trained image relevance model is below a threshold score;
generating an aggregation of near duplicate images among the set of training images;
associating image selection data of the aggregated near duplicate images with the aggregation of the near duplicate images; and
generating a second re-trained image relevance model based on content feature values of the first images of the first portion, the second images of the second portion that remain following removal of the at least some of the second images of the second portion of the set of training images, and the image selection data.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems and apparatus for refining image relevance models. In general, one aspect of the subject matter described in this specification can be implemented in methods that include re-training an image relevance model by generating a first re-trained model based on content feature values of first images of a first portion of training images in a set of training images, receiving, from the first re-trained model, image relevance scores for second images of a second portion of the set of training images, removing, from the set of training images, some of the second images identified as outlier images for which the image relevance score received from the first re-trained model is below a threshold score, and generating a second re-trained model based on content feature values of the first images of the first portion and the second images of the second portion that remain following removal of the outlier images.
-
Citations
18 Claims
-
1. A method comprising:
-
receiving a trained image relevance model that generates relevance measures of images to a query, wherein the trained image relevance model has been trained based on content feature values of a set of training images, the query being a unique set of one or more query terms received by a search system as a query input; and re-training the image relevance model, the re-training comprising; generating a first re-trained image relevance model based on content feature values of first images of a first portion of training images in the set of training images; receiving, from the first re-trained image relevance model, image relevance scores for second images of a second portion of the set of training images; removing, from the set of training images, at least some of the second images of the second portion of the set training images identified as outlier images, the outlier images being training images for which the image relevance score received from the first re-trained image relevance model is below a threshold score; generating an aggregation of near duplicate images among the set of training images; associating image selection data of the aggregated near duplicate images with the aggregation of the near duplicate images; and generating a second re-trained image relevance model based on content feature values of the first images of the first portion, the second images of the second portion that remain following removal of the at least some of the second images of the second portion of the set of training images, and the image selection data. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system, comprising:
-
a data processing apparatus; and a memory coupled to the data processing apparatus having instructions stored thereon which, when executed by the data processing apparatus cause the data processing apparatus to perform operations comprising; receiving a trained image relevance model that generates relevance measures of images to a query, wherein the trained image relevance model has been trained based on content feature values of a set of training images, the query being a unique set of one or more query terms received by a search system as a query input; and re-training the image relevance model, the re-training comprising; generating a first re-trained image relevance model based on content feature values of first images of a first portion of training images in the set of training images; receiving, from the first re-trained image relevance model, image relevance scores for second images of a second portion of the set of training images; removing, from the set of training images, at least some of the second images of the second portion of the set training images identified as outlier images, the outlier images being training images for which the image relevance score received from the first re-trained image relevance model is below a threshold score; generating an aggregation of near duplicate images among the set of training images; associating image selection data of the aggregated near duplicate images with the aggregation of the near duplicate images; and generating a second re-trained image relevance model based on content feature values of the first images of the first portion and the second images of the second portion that remain following removal of the at least some of the second images of the second portion of the set of training images, and the image selection data. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A non-transitory computer readable media storing software comprising instructions executable by a processing device and upon such execution cause the processing device to perform operations comprising:
-
receiving a trained image relevance model that generates relevance measures of images to a query, wherein the trained image relevance model has been trained based on content feature values of a set of training images, the query being a unique set of one or more query terms received by a search system as a query input; and re-training the image relevance model, the re-training comprising; generating a first re-trained image relevance model based on content feature values of first images of a first portion of training images in the set of training images; receiving, from the first re-trained image relevance model, image relevance scores for second images of a second portion of the set of training images; removing, from the set of training images, at least some of the second images of the second portion of the set training images identified as outlier images, the outlier images being training images for which the image relevance score received from the first re-trained image relevance model is below a threshold score; generating an aggregation of near duplicate images among the set of training images; associating image selection data of the aggregated near duplicate images with the aggregation of the near duplicate images; and generating a second re-trained image relevance model based on content feature values of the first images of the first portion and the second images of the second portion that remain following removal of the at least some of the second images of the second portion of the set of training images, and the image selection data. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification