Image matching and retrieval by multi-access redundant hashing
First Claim
1. An apparatus for matching an input document to a reference document in a document database, comprising:
- a document database, wherein reference descriptors are derived from content of reference documents in said document database;
a descriptor database, identifying, for each reference descriptor, a list of reference documents which include content from which said each reference descriptor is derived, the descriptor database including, for each reference document, a plurality of redundant reference descriptors for said each reference document, a reference descriptor being redundant in that said each reference document is identifiable from less than all of said plurality of redundant reference descriptors for said each reference document;
input means for inputting content of an input document to be matched against said reference documents of said document database;
descriptor derivation means, coupled to said input means, for deriving input descriptors from said content of said input document;
accumulation means, coupled to said descriptor database and said descriptor derivation means, for accumulating votes for reference documents in said document database by matching said input descriptors with said reference descriptors, said accumulation means accumulating a vote for each reference document in a list of reference documents associated with a particular reference descriptor when the particular reference descriptor matches an input descriptor; and
output means, coupled to said accumulation means, for outputting an indication of at least one matching reference document with a count of accumulated votes larger than a threshold count or larger than a count of accumulated votes for a nonmatching reference document.
1 Assignment
0 Petitions
Accused Products
Abstract
An improved document matching and retrieval system is disclosed where an input document is matched against a database of documents, using a descriptor database which lists descriptors and points to a list of documents containing features from which the descriptor is derived document. The descriptors are selected to be invariant to distortions caused by digitizing the documents or differences between the input document and its match in the document database. An array of accumulators is used to accumulate votes for each document in the document database as the descriptor base is scanned, wherein a vote is added to an accumulator for a document if the document is on the list as having a descriptor which is also found in the input document. The document which accumulates the most votes is returned as the matching document, or the documents with more than a threshold number of votes are returned.
-
Citations
18 Claims
-
1. An apparatus for matching an input document to a reference document in a document database, comprising:
-
a document database, wherein reference descriptors are derived from content of reference documents in said document database; a descriptor database, identifying, for each reference descriptor, a list of reference documents which include content from which said each reference descriptor is derived, the descriptor database including, for each reference document, a plurality of redundant reference descriptors for said each reference document, a reference descriptor being redundant in that said each reference document is identifiable from less than all of said plurality of redundant reference descriptors for said each reference document; input means for inputting content of an input document to be matched against said reference documents of said document database; descriptor derivation means, coupled to said input means, for deriving input descriptors from said content of said input document; accumulation means, coupled to said descriptor database and said descriptor derivation means, for accumulating votes for reference documents in said document database by matching said input descriptors with said reference descriptors, said accumulation means accumulating a vote for each reference document in a list of reference documents associated with a particular reference descriptor when the particular reference descriptor matches an input descriptor; and output means, coupled to said accumulation means, for outputting an indication of at least one matching reference document with a count of accumulated votes larger than a threshold count or larger than a count of accumulated votes for a nonmatching reference document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for matching an input document to a matching document which is a reference document stored in a document database, comprising the steps of:
-
deriving reference descriptors from content of the reference documents in the document database, wherein a plurality of reference descriptors derived from a given reference document includes redundant reference descriptors, whereby said given reference document is identifiable from less than all of said plurality of reference descriptors; storing, for each reference descriptor derived, a list of reference documents which include content from which said each reference descriptor is derived; inputting content of an input document to be matched against said reference documents of said document database; identifying features of said input document; normalizing descriptions of said features if said descriptions are not already invariant to transformations which are present between said input document and said matching document; deriving input descriptors from said features; accumulating votes for reference documents of said document database, which includes said matching document, by increasing a vote count for each reference document in a list of reference documents associated with a reference descriptor which matches an input descriptor; comparing counts of accumulated votes for reference documents having accumulated votes; and outputting an indication of at least one matching reference document which has a count of accumulated votes larger than a threshold count or larger than a count for a nonmatching reference document. - View Dependent Claims (12, 13, 14)
-
-
15. An apparatus for identifying a matching document from a plurality of reference documents, the matching document matching an input document more closely than a nonmatching document from the plurality of reference documents, comprising:
-
a descriptor database of reference descriptors for the plurality of reference documents, wherein a redundant number of reference descriptors are related to a reference document of the plurality of reference documents, the redundant number of reference descriptors being such that less than all the reference descriptors for the reference document are needed to identify the reference document, wherein a given reference descriptor is related to a given reference document when a feature described by the given reference descriptor is found in a content of the given reference document; input means for inputting content of the input document; descriptor derivation means for deriving descriptors from a content of a document, wherein the descriptor derivation means derives input descriptors from the content of the input document; accumulation means, coupled to the descriptor database and the descriptor derivation means, for accumulating votes for reference documents from the plurality of reference documents, a vote being accumulated for each candidate reference document related to a reference descriptor which matches an input descriptor; and output means, coupled to the accumulation means, for outputting an indication of the matching document by outputting an indication of the candidate reference document having an accumulated vote of more than a threshold count or a count for the nonmatching document. - View Dependent Claims (16, 17, 18)
-
Specification