Method and system for extracting information from a document
First Claim
1. A computer-implemented method for extracting information from a population of one or more subject documents, the method comprising:
- modeling a document structure representative of the population, the modeled document structure comprising a document component hierarchy, the document component hierarchy comprising at least one record type, each record type comprising at least one record part type, and at least one record part type comprising at least one data element type;
for a subject document exhibiting at least a portion of the modeled document structure, identifying subject document data of a type corresponding to at least one modeled data element type and associating the identified data with the corresponding modeled data element type.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method for extracting information from a population of subject documents. The method includes modeling a document structure. The modeled document structure includes at least a document component hierarchy with at least one record type. Each record type includes at least one record part type and at least one record part type comprising at least one data element type. For a subject document exhibiting at least a portion of the modeled document structure, preferred embodiments of the invention identifying data of a type corresponding to at least one modeled data element type. Identified subject document data is then associated with the corresponding modeled data element type.
-
Citations
20 Claims
-
1. A computer-implemented method for extracting information from a population of one or more subject documents, the method comprising:
-
modeling a document structure representative of the population, the modeled document structure comprising a document component hierarchy, the document component hierarchy comprising at least one record type, each record type comprising at least one record part type, and at least one record part type comprising at least one data element type;
for a subject document exhibiting at least a portion of the modeled document structure, identifying subject document data of a type corresponding to at least one modeled data element type and associating the identified data with the corresponding modeled data element type. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method for horizontally aligning a first region of a document with a second region of a document, each region characterized by a plurality of sub-regions, the method comprising:
-
determining a type for each of a plurality of sub-regions in each region;
determining an edit distance for each typed first region sub-region, typed second region sub-region pair;
determining a first region sub-region offset for those pairs characterized by an edit distance not greater than a threshold;
determining a first region offset as a function of the first region sub-region offsets offsetting the first region by the offset. - View Dependent Claims (15, 16, 17)
-
-
18. A computer program product for extracting information from a population of subject documents, the computer program product comprising:
-
a computer-readable medium;
a modeling module stored on the medium and operative to model a document structure, the modeled document structure comprising a document component hierarchy, the document component hierarchy comprising at least one record type, each record type comprising at least one record part type and at least one record part type comprising at least one data element type;
an identification module, operative to identify subject document data of a type corresponding to at least one modeled data element type, and an association module, operative to associate the identified data with the corresponding modeled data element type. - View Dependent Claims (19, 20)
-
Specification