Detecting duplicate images using hash code grouping
First Claim
1. A computer system with a processor for detecting similar images, comprising:
- a generate code component that generates a code for an image, the code having elements whose values are derived from features of the image, the elements having varying levels of significance such that similar images have similar codes with the most significant elements having the same values, the code being generated by creating a feature vector of a certain dimension that represents the image, reduces the dimension of the feature vector, and creates a code from the reduced feature vector with the reduced dimension;
an image table that groups images based on having the same values for the most significant elements; and
a detection component that uses the generate code component to generate a code for a target image, that identifies the group of images with the same values for the most significant elements as the generated code, and that identifies, from the images within the identified group, images whose codes vary less than a threshold amount from the code of the target image.
2 Assignments
0 Petitions
Accused Products
Abstract
A duplicate image detection system generates an image table that maps hash codes of images to their corresponding images. The image table may group images according to their group identifiers generated from the most significant elements of the hash codes based on significance of the elements in representing an image. The image table thus segregates images by their group identifiers. To detect a duplicate image of a target image, the detection system generates a target hash code for the target image. The detection system then identifies the group of the target image based on the group identifier of the target hash code. After identifying the group identifier, the detection system searches the corresponding group table to identify hash codes that have values that are similar to the target hash code. The detection system then selects the images associated with those similar hash codes as being duplicates of the target image.
92 Citations
13 Claims
-
1. A computer system with a processor for detecting similar images, comprising:
-
a generate code component that generates a code for an image, the code having elements whose values are derived from features of the image, the elements having varying levels of significance such that similar images have similar codes with the most significant elements having the same values, the code being generated by creating a feature vector of a certain dimension that represents the image, reduces the dimension of the feature vector, and creates a code from the reduced feature vector with the reduced dimension; an image table that groups images based on having the same values for the most significant elements; and a detection component that uses the generate code component to generate a code for a target image, that identifies the group of images with the same values for the most significant elements as the generated code, and that identifies, from the images within the identified group, images whose codes vary less than a threshold amount from the code of the target image. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-readable storage medium storing computer-executable instructions for controlling a computer to detect similar images by a method comprising:
-
generating a code for an image by generating a feature vector of a certain dimension that represents the image; reducing the dimensions of the feature vector; and creating a code from the feature vector with the reduced dimensions, the code having an element for each of the reduced dimensions of the feature vector, each element having a value of 0 or 1 that is derived from the corresponding feature of that reduced dimension, the elements having varying levels of significance such that similar images have similar codes with the most significant elements having the same values; generating an image table that groups images by the most significant elements of their codes; and identifying images that are similar to a target image by generating a target code for the target image; identifying from the image table a group of images with the same values for the most significant elements of their codes as the target code; identifying, from the images within the identified group, images whose codes vary less than a threshold amount from the target code of the target image wherein the identified images represent images that are similar to the target image. - View Dependent Claims (9, 10, 11, 12, 13)
-
Specification