Using gestalt information to identify locations in printed information
First Claim
Patent Images
1. A computer-implemented method comprising:
- obtaining an image of a rendered document that includes text;
determining a two dimensional geometric shape based at least on a first location in the image of the rendered document of a first space between a first pair of consecutive words, a second location in the image of the rendered document of a second space between a second pair of consecutive words, and a third location in the image of the rendered document of a third space between a third pair of consecutive words, wherein the first space, the second space, and the third space are not all included on a same line of text in the rendered document;
generating a document signature based on the two dimensional geometric shape; and
generating a query for an electronic document that is a counterpart to the rendered document, based at least on the document signature.
3 Assignments
0 Petitions
Accused Products
Abstract
A facility for identifying a location in a printed document is described. The facility obtains an image of the printed document, and extracts gestalt information from text occurring in the image of the printed document. The facility compares the extracted gestalt information to an index of documents and, based upon this comparison, identifies a document that includes the gestalt information.
-
Citations
21 Claims
-
1. A computer-implemented method comprising:
-
obtaining an image of a rendered document that includes text; determining a two dimensional geometric shape based at least on a first location in the image of the rendered document of a first space between a first pair of consecutive words, a second location in the image of the rendered document of a second space between a second pair of consecutive words, and a third location in the image of the rendered document of a third space between a third pair of consecutive words, wherein the first space, the second space, and the third space are not all included on a same line of text in the rendered document; generating a document signature based on the two dimensional geometric shape; and generating a query for an electronic document that is a counterpart to the rendered document, based at least on the document signature. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
-
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; obtaining an image of a rendered document that includes text; determining a two dimensional geometric shape based at least on a first location in the image of the rendered document of a first space between a first pair of consecutive words, a second location in the image of the rendered document of a second space between a second pair of consecutive words, and a third location in the image of the rendered document of a third space between a third pair of consecutive words, wherein the first space, the second space, and the third space are not all included on a same line of text in the rendered document; generating a document signature based on the two dimensional geometric shape; and generating a query for an electronic document that is a counterpart to the rendered document, based at least on the document signature. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A non-transitory computer-readable medium storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
obtaining an image of a rendered document that includes text; determining a two dimensional geometric shape based at least on a first location in the image of the rendered document of a first space between a first pair of consecutive words, a second location in the image of the rendered document of a second space between a second pair of consecutive words, and a third location in the image of the rendered document of a third space between a third pair of consecutive words, wherein the first space, the second space, and the third space are not all included on a same line of text in the rendered document; generating a document signature based on the two dimensional geometric shape; and generating a query for an electronic counterpart to the rendered document, based at least on the document signature. - View Dependent Claims (17, 18, 19, 20, 21)
-
Specification