Near duplicate images

US 9,063,954 B2
Filed: 03/15/2013
Issued: 06/23/2015
Est. Priority Date: 10/15/2012
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

generating a plurality of feature vectors for each image in a collection of images, wherein each feature vector is associated with an image tile of an image, wherein each feature vector corresponds to one of a plurality of predetermined visual words and wherein generating a feature vector for a particular image in the collection of images comprises;

determining a feature region in the particular image;

computing the feature vector from the feature region in the particular image;

quantizing the feature vector to one of the plurality of visual words;

determining an image tile to which the feature region is located;

associating the visual word with the image tile for the feature region; and

classifying as near-duplicate images all images in the collection of images that share at least a threshold number of matching visual words associated with matching image tiles.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining image search results. One of the methods includes generating a plurality of feature vectors for each image in a collection of images, wherein each feature vector is associated with an image tile of an image, wherein each feature vector corresponds to one of a plurality of predetermined visual words. All images in the collection of images that share at least a threshold number of matching visual words associated with matching image tiles are classified as near-duplicate images.

Citations

18 Claims

1. A computer-implemented method comprising:
- generating a plurality of feature vectors for each image in a collection of images, wherein each feature vector is associated with an image tile of an image, wherein each feature vector corresponds to one of a plurality of predetermined visual words and wherein generating a feature vector for a particular image in the collection of images comprises;
  
  determining a feature region in the particular image;
  
  computing the feature vector from the feature region in the particular image;
  
  quantizing the feature vector to one of the plurality of visual words;
  
  determining an image tile to which the feature region is located;
  
  associating the visual word with the image tile for the feature region; and
  
  classifying as near-duplicate images all images in the collection of images that share at least a threshold number of matching visual words associated with matching image tiles.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, further comprising:
    - determining a different image tiling for each visual word in the plurality of predetermined visual words, wherein an image tiling partitions an image into a plurality of distinct tiles.
  - 3. The method of claim 2, wherein determining a different image tiling for each visual word in the plurality of predetermined visual words comprises computing an offset based on an index number of the visual word.
  - 4. The method of claim 1, further comprising generating a feature descriptor for each image including encoding each element of the feature descriptor using each visual word and associated image tile of the image.
  - 5. The method of claim 1, further comprising:
    - determining an image type for each image in the collection of images; and
      
      determining the threshold number of matching visual words between images based on the image type of the images.
  - 6. The method of claim 1, further comprising:
    - receiving an image query;
      
      obtaining the collection of images as image search results for the image query; and
      
      removing one or more near-duplicate images from the collection of images as image search results for the image query.

7. A computer-implemented method comprising:
- receiving a query image;
  
  obtaining a set of image search results for the query image;
  
  generating a plurality of feature vectors for the query image, wherein each feature vector is associated with an image tile of the query image, wherein each feature vector corresponds to one of a plurality of predetermined visual words;
  
  generating a plurality of feature vectors for each image identified by the image search results, wherein each feature vector is associated with an image tile of an image, and wherein generating a feature vector for a particular image identified by the image search results comprises;
  
  determining a feature region in the particular image;
  
  computing the feature vector from the feature region in the particular image;
  
  quantizing the feature vector to one of the plurality of visual words;
  
  determining an image tile to which the feature region is located; and
  
  associating the visual word with the image tile for the feature region;
  
  determining that one or more images in the image search results that share at least a threshold number of matching visual words associated with matching image tiles with the query image are near-duplicate images of the query image; and
  
  removing one or more near-duplicate images of the query image from the set of image search results.
- View Dependent Claims (8, 9)
- - 8. The method of claim 7, further comprising:
    - determining a different image tiling for each visual word in the plurality of predetermined visual words, wherein an image tiling partitions an image into a plurality of distinct tiles.
  - 9. The method of claim 8, wherein determining a different image tiling for each visual word in the plurality of predetermined visual words comprises computing an offset based on an index number of the visual word.

10. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  generating a plurality of feature vectors for each image in a collection of images, wherein each feature vector is associated with an image tile of an image, wherein each feature vector corresponds to one of a plurality of predetermined visual words, and wherein generating feature vector for a particular image in the collection of images comprises;
  
  determining a feature region in the particular image;
  
  computing the feature vector from the feature region in the particular image;
  
  quantizing the feature vector to one of the plurality of visual words;
  
  determining an image tile to which the feature region is located; and
  
  associating the visual word with the image tile for the feature region; and
  
  classifying as near-duplicate images all images in the collection of images that share at least a threshold number of matching visual words associated with matching image tiles.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The system of claim 10, wherein the operations further comprise:
    - determining a different image tiling for each visual word in the plurality of predetermined visual words, wherein an image tiling partitions an image into a plurality of distinct tiles.
  - 12. The system of claim 11, wherein determining a different image tiling for each visual word in the plurality of predetermined visual words comprises computing an offset based on an index number of the visual word.
  - 13. The system of claim 10, wherein the operations further comprise generating a feature descriptor for each image including encoding each element of the feature descriptor using each visual word and associated image tile of the image.
  - 14. The system of claim 10, wherein the operations further comprise:
    - determining an image type for each image in the collection of images; and
      
      determining the threshold number of matching visual words between images based on the image type of the images.
  - 15. The system of claim 10, wherein the operations further comprise:
    - receiving an image query;
      
      obtaining the collection of images as image search results for the image query; and
      
      removing one or more near-duplicate images from the collection of images as image search results for the image query.

16. A system comprising:
- one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising;
  
  receiving a query image;
  
  obtaining a set of image search results for the query image;
  
  generating a plurality of feature vectors for the query image, wherein each feature vector is associated with an image tile of the query image, wherein each feature vector corresponds to one of a plurality of predetermined visual words;
  
  generating a plurality of feature vectors for each image identified by the image search results, wherein each feature vector is associated with an image tile of an image, and wherein generating a feature vector for a particular image identified by the image search results comprises;
  
  determining a feature region in the particular image;
  
  computing the feature vector from the feature region in the particular image;
  
  quantizing the feature vector to one of the plurality of visual words;
  
  determining an image tile to which the feature region is located; and
  
  associating the visual word with the image tile for the feature region;
  
  determining that one or more images in the image search results that share at least a threshold number of matching visual words associated with matching image tiles with the query image are near-duplicate images of the query image; and
  
  removing one or more near-duplicate images of the query image from the set of image search results.
- View Dependent Claims (17, 18)
- - 17. The system of claim 16, wherein the operations further comprise:
    - determining a different image tiling for each visual word in the plurality of predetermined visual words, wherein an image tiling partitions an image into a plurality of distinct tiles.
  - 18. The system of claim 17, wherein determining a different image tiling for each visual word in the plurality of predetermined visual words comprises computing an offset based on an index number of the visual word.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google Inc. (Alphabet Inc.)
Inventors
Ioffe, Sergey, Aly, Mohamed, Rosenberg, Charles J.
Primary Examiner(s)
Mehta, Bhavesh
Assistant Examiner(s)
Dunphy, David F

Application Number

US13/832,122
Publication Number

US 20140105505A1
Time in Patent Office

830 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06F 16/583   using metadata automaticall...

G06F 16/5838   using colour

G06V 10/464   using a plurality of salien...

G06V 10/751   Comparing pixel values or l...

Near duplicate images

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Near duplicate images

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links