Method and apparatus for detecting near-duplicate images using content adaptive hash lookups
First Claim
1. A method of generating a plurality of indexes based on content in a query image to detect related content in at least one stored image, wherein each index identifies a location within a hash table corresponding to each stored image, the hash table storing a plurality of feature vectors corresponding to the stored image, the method comprising the steps of:
- identifying at least one interest point in the query image;
generating a feature vector for the query image by defining a data value corresponding to a numeric representation of a feature of the interest point and of a plurality of additional points within a predetermined distance of the interest point;
generating a first index by quantizing the data values of the feature vector for the query image;
generating a reliability vector including a plurality of reliability values, wherein the reliability vector is of the same length as the feature vector and each reliability value corresponds to one of the data values of the feature vector and provides a numerical weighting indicating which of the data values of the feature vector are more likely to match content in the at least one stored image;
selecting a portion of the data values in the feature vector according to the corresponding reliability values of each data value; and
generating the plurality of indexes corresponding to the selected portion of the data values.
3 Assignments
0 Petitions
Accused Products
Abstract
A scalable and high performance near-duplicate image search method utilizing short hashes improves performance over existing methods. By leveraging the shortness of the hashes, the search algorithm analyzes the reliability of each bit of a hash and performs content adaptive hash lookups by adaptively adjusting the “range” of each hash bit based on reliability. Matched features are post-processed to determine the final match results. The method can detect cropped, resized, print-scanned and re-encoded images and pieces from images among thousands of images.
-
Citations
19 Claims
-
1. A method of generating a plurality of indexes based on content in a query image to detect related content in at least one stored image, wherein each index identifies a location within a hash table corresponding to each stored image, the hash table storing a plurality of feature vectors corresponding to the stored image, the method comprising the steps of:
-
identifying at least one interest point in the query image; generating a feature vector for the query image by defining a data value corresponding to a numeric representation of a feature of the interest point and of a plurality of additional points within a predetermined distance of the interest point; generating a first index by quantizing the data values of the feature vector for the query image; generating a reliability vector including a plurality of reliability values, wherein the reliability vector is of the same length as the feature vector and each reliability value corresponds to one of the data values of the feature vector and provides a numerical weighting indicating which of the data values of the feature vector are more likely to match content in the at least one stored image; selecting a portion of the data values in the feature vector according to the corresponding reliability values of each data value; and generating the plurality of indexes corresponding to the selected portion of the data values. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A. method of identifying related images, wherein at least one stored image is defined by a plurality of feature vectors stored in a hash table and wherein a position of each feature vector within the hash table is defined by an index into the hash table, the method comprising the steps of
receiving the query image from an input device operatively connected to a processor; -
identifying at least one interest point in the query image; generating a feature vector for each interest point by defining a data value corresponding to a numeric representation of a feature of the interest point and of a plurality of additional points within a predetermined distance of the interest point; generating a reliability vector including a plurality of reliability values, wherein the reliability vector is of the same length as the feature vector and each reliability value corresponds to one of the data values of the feature vector and provides a numerical weighting indicating which of the data values of the feature vector are more likely to match content in the at least one stored image; generating a first index by passing the feature vector for the query image through a hash function; selecting a portion of the data values in the feature vector according to the corresponding reliability values of each data value; generating a plurality of indexes corresponding the selected portion of the data values in the feature vector; comparing the feature vector of the query image to the feature vector located at the first index and at each of the plurality of indexes within the hash table for each stored images; and identifying at least one image from the plurality of other images related to the query image when at least one feature vector of the stored image matches one of the feature vectors of the query image. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A system for identifying related images, comprising:
-
an input device configured to receive a query image; at least one memory device storing a plurality of instructions and a plurality of images; a processor operatively connected to the memory device, the processor configured to execute the plurality of instructions to; identify at least one interest point from the query image; generate a feature vector for the query image by defining a data value corresponding to a numeric representation of a feature of the interest point and of a plurality of additional points within a predetermined distance of the interest point; generate a reliability vector including a plurality of reliability values, wherein the reliability vector is of the same length as the feature vector and each reliability value corresponds to one of the data values of the feature vector and provides a numerical weighting indicating which, of the data values of the feature vector are more likely to match content in the at least one stored image;
compare the query image to the plurality of images; and
generate a first index as a function of the feature vector;
select a portion of the data values in the feature vector according to the corresponding reliability values of each data value; and
generate a plurality of indexes corresponding to the selected data values wherein the first index and each of the plurality of indexes identifies a location within a hash table corresponding to each stored image, the hash table stored in the at least one memory device and including a plurality of feature vectors corresponding to the stored image; and
identify at least one image related to the query image from the plurality of images as a function of the plurality of indexes. - View Dependent Claims (18, 19)
-
Specification