Retrieving electronic documents by converting them to synthetic text
First Claim
1. A system comprising:
- one or more processors;
a document storage configured to store a plurality of electronic documents; and
a memory coupled with the one or more processors, the memory configured to store modules for execution by the one or more processors, the modules including;
an indexing module configured to, in response to receiving an image patch, identify a first and second structure in the image patch wherein the first and second structure have a vertical positional relationship, generate a first one-dimensional text string that encodes two-dimensional location and size information for the first structure using a relative location and size of the second structure; and
a retrieval module configured to search a library for a second one-dimensional text string that is similar to the first one-dimensional text-string and retrieve, from the document storage, an electronic document corresponding to the second one-dimensional text string that is similar to the first one-dimensional text string.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention relies on the two-dimensional information in documents and encodes two-dimensional structures into a one-dimensional synthetic language such that two-dimensional documents can be searched at text search speed. The system comprises: an indexing module, a retrieval module, an encoder, a quantization module, a retrieval engine and a control module coupled by a bus. Electronic documents are first indexed by the indexing module and stored as a synthetic text library. The retrieval module then converts an input image to synthetic text and searches for matches to the synthetic text in the synthetic text library. The matches can be in turn used to retrieve the corresponding electronic documents. In one or more embodiments, the present invention includes a method for comparing the synthetic text to documents that have been converted to synthetic text for a match.
91 Citations
20 Claims
-
1. A system comprising:
-
one or more processors; a document storage configured to store a plurality of electronic documents; and a memory coupled with the one or more processors, the memory configured to store modules for execution by the one or more processors, the modules including; an indexing module configured to, in response to receiving an image patch, identify a first and second structure in the image patch wherein the first and second structure have a vertical positional relationship, generate a first one-dimensional text string that encodes two-dimensional location and size information for the first structure using a relative location and size of the second structure; and a retrieval module configured to search a library for a second one-dimensional text string that is similar to the first one-dimensional text-string and retrieve, from the document storage, an electronic document corresponding to the second one-dimensional text string that is similar to the first one-dimensional text string. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
receiving, at one or more processors, an image patch; identifying a first and second structure in the image patch, wherein the first and second structure have a vertical positional relationship; generating a first one-dimensional text string that encodes two-dimensional location and size information for the first structure using a relative location and size of the second structure; searching a library for a second one-dimensional text string that is similar to the first one-dimensional text string; and retrieving, from a document storage, an electronic document corresponding to the second one-dimensional text string that is similar to the first one-dimensional text string. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable medium carrying instructions which, when processed by one or more processors, causes:
-
receiving, at the one or more processors, an image patch; identifying a first and second structure in the image patch, wherein the first and second structure have a vertical positional relationship; generating a first one-dimensional text string that encodes two-dimensional location and size information for the first structure using a relative location and size of the second structure; searching a library for a second one-dimensional text string that is similar to the first one-dimensional text string; and retrieving, from a document storage, an electronic document corresponding to the second one-dimensional text string that is similar to the first one-dimensional text string. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification