Retrieving Electronic Documents by Converting Them to Synthetic Text
First Claim
1. A system for converting documents to synthetic text, the system comprising:
- an indexing module for receiving an electronic document and generating a synthetic representation of the electronic document, the indexing module also producing an association between the electronic document and the synthetic representation; and
a data storage for storing the electronic document, the synthetic representation of the electronic document, and an association between the electronic document, a page of the electronic document and the synthetic representation
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention relies on the two-dimensional information in documents and encodes two-dimensional structures into a one-dimensional synthetic language such that two-dimensional documents can be searched at text search speed. The system comprises: an indexing module, a retrieval module, an encoder, a quantization module, a retrieval engine and a control module coupled by a bus. A number of electronic documents are first indexed by the indexing module and stored as a synthetic text library. The retrieval module then converts and input image to synthetic text and searches for matches to the synthetic text in the synthetic text library. The matches can be in turn used to retrieve the corresponding electronic documents. It should be noted that a plurality of matches and corresponding electronic documents may be retrieves ranked by order according the similarity of the synthetic text. In one or more embodiments, the present invention includes a method for indexing documents by converting them to synthetic text, and a method for retrieving documents by converting an image to synthetic text and comparing the synthetic text to documents that have been converted to synthetic text for a match.
174 Citations
37 Claims
-
1. A system for converting documents to synthetic text, the system comprising:
-
an indexing module for receiving an electronic document and generating a synthetic representation of the electronic document, the indexing module also producing an association between the electronic document and the synthetic representation; and a data storage for storing the electronic document, the synthetic representation of the electronic document, and an association between the electronic document, a page of the electronic document and the synthetic representation - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method for generating synthetic text for an electronic document, the method comprising:
-
receiving an electronic document; defining a two-dimensional structure for encoding; identifying a plurality of two-dimensional structure in the electronic document; and encoding at least one of the plurality of two-dimensional structures in the electronic document to create a one-dimensional structure. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24)
-
-
25. A method for retrieving an electronic document in response to receipt of an image patch, the method comprising:
-
receiving an image patch; defining a two-dimensional structure in the image patch; encoding at least one two-dimensional structure of the image patch into a one-dimensional structure; searching for any one-dimensional structure similar to the one two-dimensional structure of the image patch; and retrieving a electronic document corresponding to the one-dimensional structure similar to the one two-dimensional structure of the image patch. - View Dependent Claims (26, 27, 28, 29, 30, 31)
-
-
32. A method for obtaining features of text documents, the method comprising:
-
defining a plurality of bounding boxes around features of a text document; creating a graph between the plurality of bounding boxes; and generating a feature vector representing the bounding box. - View Dependent Claims (33, 34, 35, 36, 37)
-
Specification