Method and system for collecting data from a plurality of machine readable documents
First Claim
1. A method for collection of data from documents present in machine-readable form, the method performed by a computer system with a processor and memory, the method comprising the steps of:
- associating at least one already-processed document stored as a template and subsequently designated as a template document with a document to be processed that is designated as a read document, fields for data to be extracted being defined in the template document, wherein associating the at least one already-processed document with the document to be processed is performed by the processor executing instructions stored in the memory;
automatically extracting data from the read document, the data contained in regions of the read document that correspond to the fields in the template document, wherein automatically extracting data is performed by the processor executing instructions stored in the memory; and
if an error occurs, or if no suitable template document is associated;
showing the read document on a screen and manually inputting fields in the read document from which the data are extracted; and
storing the read document with field specifications as a new template document, or correcting the at least one already-processed template document corresponding to the newly input fields.
7 Assignments
0 Petitions
Accused Products
Abstract
In a method and system for collection of data from documents present in machine-readable form, at least one already-processed document stored as a template and designated as a template document is associated with a document to be processed designated as a read document. Fields for data to be extracted are defined in the template document. Data contained in the read document are already extracted from regions that correspond to the fields in the template document. Should an error have occurred or no suitable template document having been associated given the automatic extraction of the data, the read document is shown on a screen and fields are manually inputted in the read document from which the data are extracted. After the manual input of the fields in the read document, the read document with field specifications is stored as a new template document or the previous template document is corrected corresponding to the newly input fields.
-
Citations
20 Claims
-
1. A method for collection of data from documents present in machine-readable form, the method performed by a computer system with a processor and memory, the method comprising the steps of:
-
associating at least one already-processed document stored as a template and subsequently designated as a template document with a document to be processed that is designated as a read document, fields for data to be extracted being defined in the template document, wherein associating the at least one already-processed document with the document to be processed is performed by the processor executing instructions stored in the memory; automatically extracting data from the read document, the data contained in regions of the read document that correspond to the fields in the template document, wherein automatically extracting data is performed by the processor executing instructions stored in the memory; and if an error occurs, or if no suitable template document is associated; showing the read document on a screen and manually inputting fields in the read document from which the data are extracted; and storing the read document with field specifications as a new template document, or correcting the at least one already-processed template document corresponding to the newly input fields. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for collection of data from documents present in machine-readable form, the method performed by a computer system with a processor and memory, the method comprising the steps of:
-
associating at least one already processed document stored as a template and designated as a template document with a document to be processed designated as a read document, where fields for data to be extracted are defined in the template document; said associating occurring by a cost function with which a similarity between the read document and template documents is calculated and a template document with a best similarity is associated with the read document; and automatically extracting data contained in the read document from regions that correspond to the fields in the template document, wherein automatically extracting data is performed by the processor executing instructions that are stored in the memory.
-
-
10. A computer system for collecting data from documents present in machine-readable form, the computer system having a processor and a memory, the memory containing computer readable instructions which, when executed by the processor, perform the following steps:
-
compare at least one already processed document that is stored as a template document and designated as template document with a document to be processed designated as a read document, fields for data to be extracted being defined in the template document; determine a similarity between the read document and the template document using a cost function, associate a template document having a best similarity with the read document; and automatically extract data contained in the read document from regions that correspond to the fields in the template document. - View Dependent Claims (11)
-
-
12. A method of transforming a template document using a read document, the method including:
-
associating a stored template document with a read document, wherein the stored template document includes at least one field, and wherein the read document includes data; automatically extracting at least some of the data from the read document, wherein the extracted data is located in at least one region of the read document that correspond to the at least one field of the stored template document; displaying the read document on a display screen; receiving an input that associates a region of the read document with a chosen field; and generating a template document, wherein; the template document is a new template document and is transformed to incorporate the chosen field associated with the read document, or the template document is the stored template document and is transformed to incorporate the chosen field associated with the read document. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A method of generating a template document using a read document, the method performed by a computer system having a processor and memory, the method including:
-
receiving a read document, wherein the read document is received from a scanner; comparing a stored template document with the read document, wherein the stored template document includes at least one field, wherein comparing the stored template document with the read document is performed by the processor executing instructions that are stored in the memory; displaying the read document on a display screen; receiving an input that associates a region of the read document with a chosen field; and incorporating the chosen field into the stored template document or creating a new template document with the read document.
-
Specification