Document matching using structural information
First Claim
Patent Images
1. A method comprising:
- generating structural information, including a point set, describing a target document;
comparing the structural information describing the target document to a set of structural information describing a set of stored electronic documents; and
retrieving one or more stored electronic documents from the set of stored electronic documents if the structural information describing the stored electronic documents matches the structural information describing the target document within a predetermined tolerance;
analyzing the target document to determine the structural information describing the target document, wherein the analyzing comprises;
receiving a raster image; and
removing text from the raster image.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for document matching using structural information. The present invention provides a method and apparatus for identifying documents based on the visual structure of the document. Structural information describing a document is generated and used to search for matching stored documents. In one embodiment, images are converted to a point set and the point sets are compared. For example, an image of a document that is sought is converted to a point set and the point set is compared to point sets corresponding to stored documents. When point sets match within a predetermined tolerance, the documents match. In one embodiment, the Hausdorff measure is used to compare point sets.
73 Citations
17 Claims
-
1. A method comprising:
-
generating structural information, including a point set, describing a target document;
comparing the structural information describing the target document to a set of structural information describing a set of stored electronic documents; and
retrieving one or more stored electronic documents from the set of stored electronic documents if the structural information describing the stored electronic documents matches the structural information describing the target document within a predetermined tolerance;
analyzing the target document to determine the structural information describing the target document, wherein the analyzing comprises;
receiving a raster image; and
removing text from the raster image. - View Dependent Claims (2, 3, 4, 5)
determining end points for a line segment corresponding to remaining components of the raster image; and
generating the point set based on the end points.
-
-
3. The method of claim 1, wherein comparing the structural information describing the target document to a set of structural information describing a set of stored electronic documents comprises comparing point sets using the Hausdorff Method.
-
4. The method of claim 1 further comprising generating a physical document based on the one or more electronic documents retrieved.
-
5. The method of claim 1, wherein the analyzing further comprises:
-
identifying comers corresponding to remaining components of the raster image; and
generating the point set based on the comers.
-
-
6. A machine-readable medium having stored thereon sequences of instructions, which when executed by a processor cause the processor to perform the following comprising:
-
generating structural information, including a point set, describing a target document;
comparing the structural information describing the target document to a set of structural information describing a set of stored electronic documents; and
retrieving one or more stored electronic documents from the set of stored electronic documents if the structural information describing the stored electronic documents matches the structural information describing the target document within a predetermined tolerance; and
analyzing the target document to determine structural information describing the target document, comprising;
receiving a raster image; and
removing text from the raster image. - View Dependent Claims (7, 8, 9, 10)
determine end points for a line segment corresponding to remaining components of the raster image; and
generate the point set based on the end points.
-
-
8. The machine-readable medium of claim 6, wherein the sequences of instructions that cause the processor to compare the structural information describing the target document to a set of structural information describing a set of stored electronic documents by comparing point sets using the Hausdorff Method.
-
9. The machine-readable medium of claim 6 further comprising sequences of instructions that cause the processor to generate a physical document based on the one or more electronic documents retrieved.
-
10. The machine-readable medium of claim 6, wherein the sequence of instructions cause the processor to further perform the following comprising:
-
identifying comers corresponding to remaining components of the raster image; and
generating the point set based on the corners.
-
-
11. An apparatus, comprising:
-
means for generating structural information, including a point set, describing a target document;
means for comparing the structural information describing the target document to a set of structural information describing a set of stored electronic documents;
means for retrieving one or more stored electronic documents from the set of stored electronic documents if the structural information describing the stored electronic documents matches the structural information describing the target document within a predetermined tolerance;
means for analyzing the target document to determine structural information describing the target document;
means for receiving a raster image; and
means for removing text from the raster image. - View Dependent Claims (12, 13, 14, 17)
means for determining end points for a line segment corresponding to remaining components of the raster image; and
means for generating the point set based on the end points.
-
-
13. The apparatus of claim 11, wherein the means for comparing the structural information describing the target document to a set of structural information describing a set of stored electronic documents comprises means for comparing point sets using the Hausdorff Method.
-
14. The apparatus of claim 11 further comprising means for generating a physical document based on the one or more electronic documents retrieved.
-
17. The apparatus of claim 11, further comprising:
-
means for identifying corners corresponding to remaining components of the raster image; and
means for generating the point set based on the comers.
-
-
15. A document matching apparatus comprising:
-
a processor to generate structural information, including a point set, based on a target document;
a storage device coupled to the processor to store multiple electronic documents and structural information based on each of a set of one or more electronic documents;
wherein the processor is configured to compare the structural information based on the target document to the structural information based on the electronic documents to determine whether the structural information based on the target document and structural information of one or more electronic documents matches within a predetermined tolerance, and wherein the processor is configured to analyze the target document to determine structural information describing the target document, including receiving a raster image and removing text from the raster image. - View Dependent Claims (16)
-
Specification