Efficient image matching for large sets of images
First Claim
1. A method in a computing system for matching a query image against a catalog of images to identify semi-identical images, the method comprising:
- maintaining an inverted index to a catalog of images, each image in the catalog of images characterized by vectors associated with principle feature points of the image, the inverted index comprised of hash values of the vectors associated with each image, wherein the hash value of vectors are calculated using a k-d tree;
receiving a query image that is to be searched against the catalog of images;
characterizing the received query image by;
extracting principal feature points from the query image;
creating vectors characterizing the extracted principal feature points; and
generating hash values for each of the vectors characterizing the query image; and
searching the catalog of images to find semi-identical images to the query image by;
comparing the query image hash values with the inverted index to the catalog of images;
identifying a set of catalog images having a predetermined number of hash values in common with the query image hash values;
identifying a number of geometric inliers in each image in the identified set of catalog images; and
identifying, from the set of catalog images, a set of near-identical images based on images that have a total number of geometric inliers that exceed a threshold value.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method to detect similarities between images. The system and method allow comparisons between a query image and one or more catalog images in a manner that is resilient to scanning, scaling, rotating, cropping and other distortions of the query image. The system includes an image processing module that determines and/or calculates principle features of a catalog image and constructs a feature vector using one or more of the principle features. The system also includes a matching module that matches a query image to one or more catalog images. The system finds matches based on a distance measure of features present in the query image and features present in the catalog images.
-
Citations
26 Claims
-
1. A method in a computing system for matching a query image against a catalog of images to identify semi-identical images, the method comprising:
-
maintaining an inverted index to a catalog of images, each image in the catalog of images characterized by vectors associated with principle feature points of the image, the inverted index comprised of hash values of the vectors associated with each image, wherein the hash value of vectors are calculated using a k-d tree; receiving a query image that is to be searched against the catalog of images; characterizing the received query image by; extracting principal feature points from the query image; creating vectors characterizing the extracted principal feature points; and generating hash values for each of the vectors characterizing the query image; and searching the catalog of images to find semi-identical images to the query image by; comparing the query image hash values with the inverted index to the catalog of images; identifying a set of catalog images having a predetermined number of hash values in common with the query image hash values; identifying a number of geometric inliers in each image in the identified set of catalog images; and identifying, from the set of catalog images, a set of near-identical images based on images that have a total number of geometric inliers that exceed a threshold value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A non-transitory computer-readable medium containing instructions that, when executed on a processor, cause the processor to implement a method for matching a query image against a catalog of images to identify semi-identical images, the method comprising:
-
maintaining an inverted index to a catalog of images, each image in the catalog of images characterized by vectors associated with principle feature points of the image, the inverted index comprised of hash values of the vectors associated with each image, wherein the has values of vectors are calculated using a k-d treee; receiving a query image that is to be searched against the catalog of images; characterizing the received query image by; extracting principal feature points from the query image; creating vectors characterizing the extracted principal feature points; and generating hash values for each of the vectors characterizing the query image; and searching the catalog of images to find semi-identical images to the query image by; comparing the query image hash values with the inverted index to the catalog of images; identifying a set of catalog images having a predetermined number of hash values in common with the query image hash values; identifying a number of geometric inliers in each image in the identified set of catalog images; and identifying, from the set of catalog images, a set of near-identical images based on images that have a total number of geometric inliers that exceed a threshold value. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A method in a computing system for matching a query image against a catalog of images to identify semi-identical images, the method comprising:
-
maintaining an inverted index to a catalog of images, each image in the catalog of images characterized by vectors associated with principle feature points of the image, the inverted index comprised of hash values of the vectors associated with each image; pre-processing the catalog of images to remove unwanted aspects from images in the catalog, wherein the catalog of images is pre-processed by comparison with a catalog of unwanted features to remove unwanted features from an image; receiving a query image that is to be searched against the catalog of images; characterizing the received query image by; extracting principal feature points from the query image; creating vectors characterizing the extracted principal feature points; and generating hash values for each of the vectors characterizing the query image; and searching the catalog of images to find semi-identical images to the query image by; comparing the query image hash values with the inverted index to the catalog of images; identifying a set of catalog images having a predetermined number of hash values in common with the query image hash values; identifying a number of geometric inliers in each image in the identified set of catalog images; and identifying, from the set of catalog images, a set of near-identical images based on images that have a total number of geometric inliers that exceed a threshold value. - View Dependent Claims (17, 18, 19, 20, 21, 22)
-
-
23. A non-transitory computer-readable medium containing instructions that, when executed on a processor, cause the processor to implement a method for matching a query image against a catalog of images to identify semi-identical images, the method comprising:
-
maintaining an inverted index to a catalog of images, each image in the catalog of images characterized by vectors associated with principle feature points of the image, the inverted index comprised of hash values of the vectors associated with each image; pre-processing the catalog of images to remove unwanted aspects from images in the catalog, wherein the catalog of images is pre-processed by comparison with a catalog of unwanted features to remove unwanted features from an image, or wherein the catalog of images is pre-processed using a support vector machine (SVM) to remove principle feature points associated with an image; receiving a query image that is to be searched against the catalog of images; characterizing the received query image by; extracting principal feature points from the query image; creating vectors characterizing the extracted principal feature points; and generating hash values for each of the vectors characterizing the query image; and searching the catalog of images to find semi-identical images to the query image by; comparing the query image hash values with the inverted index to the catalog of images; identifying a set of catalog images having a predetermined number of hash values in common with the query image hash values; identifying a number of geometric inliers in each image in the identified set of catalog images; and identifying, from the set of catalog images, a set of near-identical images based on images that have a total number of geometric inliers that exceed a threshold value. - View Dependent Claims (24, 25, 26)
-
Specification