Systems and methods for automatic form segmentation for raster-based passive electronic documents
First Claim
Patent Images
1. A method for processing electronic documents, comprising the steps of:
- receiving as input an electronic document, wherein at least a portion of the electronic document is raster-based;
extracting form information from text portions and non-text portions of the electronic document; and
generating a structured document representing the extracted form information for the electronic document based on a predefined document type definition.
2 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for automatically extracting form information (document structure, elements, format, etc.) from electronic documents such as raster-based passive documents, and storing such form information in a file in accordance with a predetermined DTD (document type definition).
207 Citations
23 Claims
-
1. A method for processing electronic documents, comprising the steps of:
-
receiving as input an electronic document, wherein at least a portion of the electronic document is raster-based;
extracting form information from text portions and non-text portions of the electronic document; and
generating a structured document representing the extracted form information for the electronic document based on a predefined document type definition. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for processing electronic documents, the method steps comprising:
-
receiving as input an electronic document, wherein at least a portion of the electronic document is raster-based;
extracting form information from text portions and non-text portions of the electronic document; and
generating a structured document representing the extracted form information for the electronic document based on a predefined document type definition. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A system for processing electronic documents, comprising:
-
a text differentiation module for processing an electronic document to identify text portions and non-text portions of the electronic document, wherein at least a portion of the electronic document is raster-based;
a segmentation module for segmenting the identified text portions and non-text portions;
a text processing and pattern matching module and a form information extraction module, for processing the text portions of the electronic document and extracting form information from the text portions;
an image processing and object recognition module and a form information extraction module, for processing the non-text portions of the electronic document and extracting form information from the non-text segments; and
a file generator for combining the extracted form information for the text portions and non-text portions to generate a structured document representing the extracted form information for the electronic document based on a predefined document type definition.
-
Specification