Method and apparatus for detecting near-duplicate images using content adaptive hash lookups

US 9,047,534 B2
Filed: 08/10/2012
Issued: 06/02/2015
Est. Priority Date: 08/11/2011
Status: Active Grant

First Claim

Patent Images

1. A method of generating a plurality of indexes based on content in a query image to detect related content in at least one stored image, wherein each index identifies a location within a hash table corresponding to each stored image, the hash table storing a plurality of feature vectors corresponding to the stored image, the method comprising the steps of:

identifying at least one interest point in the query image;

generating a feature vector for the query image by defining a data value corresponding to a numeric representation of a feature of the interest point and of a plurality of additional points within a predetermined distance of the interest point;

generating a first index by quantizing the data values of the feature vector for the query image;

generating a reliability vector including a plurality of reliability values, wherein the reliability vector is of the same length as the feature vector and each reliability value corresponds to one of the data values of the feature vector and provides a numerical weighting indicating which of the data values of the feature vector are more likely to match content in the at least one stored image;

selecting a portion of the data values in the feature vector according to the corresponding reliability values of each data value; and

generating the plurality of indexes corresponding to the selected portion of the data values.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A scalable and high performance near-duplicate image search method utilizing short hashes improves performance over existing methods. By leveraging the shortness of the hashes, the search algorithm analyzes the reliability of each bit of a hash and performs content adaptive hash lookups by adaptively adjusting the “range” of each hash bit based on reliability. Matched features are post-processed to determine the final match results. The method can detect cropped, resized, print-scanned and re-encoded images and pieces from images among thousands of images.

Citations

19 Claims

1. A method of generating a plurality of indexes based on content in a query image to detect related content in at least one stored image, wherein each index identifies a location within a hash table corresponding to each stored image, the hash table storing a plurality of feature vectors corresponding to the stored image, the method comprising the steps of:
- identifying at least one interest point in the query image;
  
  generating a feature vector for the query image by defining a data value corresponding to a numeric representation of a feature of the interest point and of a plurality of additional points within a predetermined distance of the interest point;
  
  generating a first index by quantizing the data values of the feature vector for the query image;
  
  generating a reliability vector including a plurality of reliability values, wherein the reliability vector is of the same length as the feature vector and each reliability value corresponds to one of the data values of the feature vector and provides a numerical weighting indicating which of the data values of the feature vector are more likely to match content in the at least one stored image;
  
  selecting a portion of the data values in the feature vector according to the corresponding reliability values of each data value; and
  
  generating the plurality of indexes corresponding to the selected portion of the data values.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 further comprising the step of normalizing the feature vector prior to generating the first index, wherein normalizing the feature vector includes resizing the feature vector from a plurality of data values over the predetermined distance to a plurality of data values over a normalized distance.
  - 3. The method of claim 1 wherein the first index is generated using binary quantization of the feature vector.
  - 4. The method of claim 3 wherein the reliability values are linearly proportional to an absolute magnitude of each of the data values in the feature vector.
  - 5. The method of claim 3 wherein selecting the portion of the data values in the feature vector includes identifying at most five unreliable bits in the first index that result from quantizing the data values and that have a reliability value less than a predetermined threshold and wherein generating the plurality of indexes includes inverting at least one unreliable bit and generating a unique index for each combination of the unreliable bits.
  - 6. The method of claim 1 wherein a plurality of interest points from each stored image is stored in a table, the table including an index and data corresponding to the index for each of the interest points, the method further comprising the steps of:
    - retrieving the data for the stored image corresponding to each of the plurality of indexes;
      
      comparing the data corresponding to the interest point in the stored image to the data corresponding to one of the interest points identified in the query image; and
      
      identifying a matching interest point if the data corresponding to the interest point identified in the query image matches the data corresponding to one of the interest points in the stored image.
  - 7. The method of claim 6 wherein the content of the query image is identified as matching the content of one of the stored images if at least three matching interest points are identified.

8. A. method of identifying related images, wherein at least one stored image is defined by a plurality of feature vectors stored in a hash table and wherein a position of each feature vector within the hash table is defined by an index into the hash table, the method comprising the steps ofreceiving the query image from an input device operatively connected to a processor;
- identifying at least one interest point in the query image;
  
  generating a feature vector for each interest point by defining a data value corresponding to a numeric representation of a feature of the interest point and of a plurality of additional points within a predetermined distance of the interest point;
  
  generating a reliability vector including a plurality of reliability values, wherein the reliability vector is of the same length as the feature vector and each reliability value corresponds to one of the data values of the feature vector and provides a numerical weighting indicating which of the data values of the feature vector are more likely to match content in the at least one stored image;
  
  generating a first index by passing the feature vector for the query image through a hash function;
  
  selecting a portion of the data values in the feature vector according to the corresponding reliability values of each data value;
  
  generating a plurality of indexes corresponding the selected portion of the data values in the feature vector;
  
  comparing the feature vector of the query image to the feature vector located at the first index and at each of the plurality of indexes within the hash table for each stored images; and
  
  identifying at least one image from the plurality of other images related to the query image when at least one feature vector of the stored image matches one of the feature vectors of the query image.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
- - 9. The method of claim 8, wherein the step of generating the plurality of indexes includes quantizing the feature vector.
  - 10. The method of claim 9 further comprising the initial steps of:
    - identifying at least one interest point from the plurality of stored images;
      
      generating a feature vector for each interest point for the plurality of stored images as a function of the interest point;
      
      quantizing each feature vector for the plurality of images; and
      
      storing image data of the plurality of images in a database as a function of the quantized feature vectors.
  - 11. The method of claim 10 wherein the step of quantizing the feature vector for the plurality of images generates an index to a table in the database and the image data is stored in the database according to the index.
  - 12. The method of claim 11 wherein the step of identifying at least one image further includes the step of generating a plurality of indexes to the table as a function of the reliability vector and the quantized value of each feature vector of the query image.
  - 13. The method of claim 8 further comprising the step of normalizing a scale space of the image around the interest point prior to generating the feature vector.
  - 14. The method of claim 8 wherein at least three points of interest of the query image are related to corresponding points of interest in one of the plurality of other images.
  - 15. The method of claim 14 wherein image data of the query image located within a first triangle defined by the three points of interest of the query image is related to image data of the located image within a second triangle defined by the corresponding three points of interest for the other image.
  - 16. The method of claim 8 wherein at least a portion of the query image relates to a portion of the other image.

17. A system for identifying related images, comprising:
- an input device configured to receive a query image;
  
  at least one memory device storing a plurality of instructions and a plurality of images;
  
  a processor operatively connected to the memory device, the processor configured to execute the plurality of instructions to;
  
  identify at least one interest point from the query image;
  
  generate a feature vector for the query image by defining a data value corresponding to a numeric representation of a feature of the interest point and of a plurality of additional points within a predetermined distance of the interest point;
  
  generate a reliability vector including a plurality of reliability values, wherein the reliability vector is of the same length as the feature vector and each reliability value corresponds to one of the data values of the feature vector and provides a numerical weighting indicating which, of the data values of the feature vector are more likely to match content in the at least one stored image;
  
  compare the query image to the plurality of images; and
  
  generate a first index as a function of the feature vector;
  
  select a portion of the data values in the feature vector according to the corresponding reliability values of each data value; and
  
  generate a plurality of indexes corresponding to the selected data values wherein the first index and each of the plurality of indexes identifies a location within a hash table corresponding to each stored image, the hash table stored in the at least one memory device and including a plurality of feature vectors corresponding to the stored image; and
  
  identify at least one image related to the query image from the plurality of images as a function of the plurality of indexes.
- View Dependent Claims (18, 19)
- - 18. The system of claim 17 wherein the first index is a binary value having forty or less bits and the processor is further configured to execute the plurality of instructions to:
    - identify at most five unreliable bits in the first index; and
      
      generate the plurality of indexes as a function of the unreliable bits.
  - 19. The system of claim 17 wherein the processor is further configured to identify the image related to the query image from the plurality of images if at least three matching interest points are identified.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Anvato, Inc. (Alphabet Inc.)
Inventors
Harmanci, Oztan, Haritaoglu, Ismail
Primary Examiner(s)
Le, Vu
Assistant Examiner(s)
WOLDEMARIAM, AKLILU K

Application Number

US13/572,075
Publication Number

US 20130039584A1
Time in Patent Office

1,026 Days
Field of Search

382/130, 382/190, 382/195, 382/197, 382/219, 382/220, 382/305, 382/321, 382/227, 382/128
US Class Current

1/1
CPC Class Codes

G06F 16/51   Indexing; Data structures t...

G06F 16/583   using metadata automaticall...

G06V 10/28   Quantising the image, e.g. ...

G06V 10/462   Salient features, e.g. scal...

Method and apparatus for detecting near-duplicate images using content adaptive hash lookups

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for detecting near-duplicate images using content adaptive hash lookups

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links