Identifying a document by performing spectral analysis on the contents of the document
First Claim
1. A method in a computing system for generating a signature for a rendered document, the method comprising:
- creating an index for the rendered document that stores entries associated with words in the rendered document, wherein each of the entries contains a value for the word associated with the entry, the value representing the word relative to other words in the rendered document; and
building an identifier for the rendered document based on the representative values contained in the index.
4 Assignments
0 Petitions
Accused Products
Abstract
A system and method for identifying a document based on a spectral analysis of the text of the document is described. In some examples, the system generates a document identifier for a rendered document based on assigning values to words in the rendered document, such as values associated with the frequency of use of the word by the rendered document, the absolute or relative position of the word in the rendered document, and so on. The system may use the document identifier to generate a group of documents having similar document identifiers, and choose a likely match from the group of documents based on predictive analysis.
-
Citations
20 Claims
-
1. A method in a computing system for generating a signature for a rendered document, the method comprising:
-
creating an index for the rendered document that stores entries associated with words in the rendered document, wherein each of the entries contains a value for the word associated with the entry, the value representing the word relative to other words in the rendered document; and building an identifier for the rendered document based on the representative values contained in the index. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-readable medium, whose contents, when executed by a computing system, cause the computing system to generate a signature for a rendered document, the method comprising:
-
creating an index for the rendered document that stores entries associated with words in the rendered document, wherein each of the entries contains a value for the word associated with the entry, the value representing the word relative to other words in the rendered document; and building an identifier for the rendered document based on the representative values contained in the index. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A system for identifying a document, the system comprising:
-
a capture component, wherein the capture component is programmed to capture a subset of text from a rendered document; an index component, wherein the index component is programmed to generate an index for the rendered document using the captured text, the generated index including values representing the text in rendered document relative to other text in the rendered document; an identifier component, wherein the identifier component is programmed to generate an identifier that is based on the values in the generated index; a document identification component, wherein the document identification component is programmed to identify a group of candidate documents that have identifiers similar to the generated identifier; a constraint component, wherein the constraint component is programmed to apply one or more constraints to the group of candidate documents; and a document selection component, wherein the document selection component is programmed to select one of the candidate documents based on the applied constraints. - View Dependent Claims (18, 19, 20)
-
Specification