Coordinate-based document processing and data entry system and method
First Claim
1. A computer-implemented method for document processing and data extraction, over a network, the method comprising the steps of:
- receiving a first document at a processor, the first document containing data for extraction, wherein the processor performs at least the following;
outputting the document in a preferred document format;
receiving a selection of a first portion of the document;
recognizing a type of data contained in the selection, wherein the type of data includes at least one type of text and at least one type of image;
processing the first portion to extract a first layer of data and a second layer of data, wherein the first layer of data includes a plurality of text entries, and the second layer of data includes an image;
automatically generating, based on the step of processing, a plurality of coordinate sets for the plurality of text entries;
extracting, based on the plurality of coordinate sets, the plurality of text entries from the first layer;
extracting the image from the second layer, wherein each of the one or more layers is extracted separately based on the at least one user preference;
automatically generating and storing in computer memory, a structured data set that includes the extracted text data, the extracted text data being structured in the structured data set based on coordinates of the extracted text data;
automatically generating an extraction rule based on the generated plurality of coordinate sets and the structured data set, the extraction rule executable by the processor to extract text entries of the at least one type of text from a second document received after the first document;
receiving the second document from a computer system;
automatically matching the second document based on the step of automatically generating the extraction rule; and
extracting, based on the step of matching, data from the second document by executing the generated extraction rule, including based at least on a portion of the generated plurality of coordinate sets.
3 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the presently disclosed invention are directed to a document processing system and method that facilitates the processing and extraction of data from the documents. The system and method receive at least one document, where the document may contain data for extraction. The document may then be converted into a preferred document format and outputted to a user interface. The system and method may then receive a selection of at least a portion of the document, wherein the selection contains data for extraction. Based on the selection, at least one coordinate set corresponding to the selection and associated with at least one data field of interest is generated. Then the data from the selection of the document is extracted using the at least one coordinate set. Finally, a structured data set that includes the extracted data is generating, and storing in a computer memory.
56 Citations
20 Claims
-
1. A computer-implemented method for document processing and data extraction, over a network, the method comprising the steps of:
receiving a first document at a processor, the first document containing data for extraction, wherein the processor performs at least the following; outputting the document in a preferred document format; receiving a selection of a first portion of the document; recognizing a type of data contained in the selection, wherein the type of data includes at least one type of text and at least one type of image; processing the first portion to extract a first layer of data and a second layer of data, wherein the first layer of data includes a plurality of text entries, and the second layer of data includes an image; automatically generating, based on the step of processing, a plurality of coordinate sets for the plurality of text entries; extracting, based on the plurality of coordinate sets, the plurality of text entries from the first layer; extracting the image from the second layer, wherein each of the one or more layers is extracted separately based on the at least one user preference; automatically generating and storing in computer memory, a structured data set that includes the extracted text data, the extracted text data being structured in the structured data set based on coordinates of the extracted text data; automatically generating an extraction rule based on the generated plurality of coordinate sets and the structured data set, the extraction rule executable by the processor to extract text entries of the at least one type of text from a second document received after the first document; receiving the second document from a computer system; automatically matching the second document based on the step of automatically generating the extraction rule; and extracting, based on the step of matching, data from the second document by executing the generated extraction rule, including based at least on a portion of the generated plurality of coordinate sets. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 19)
-
14. A system for document processing and data extraction using coordinate selection data over a network, the system comprising:
-
non-transitory memory; and at least one computer processor accessing the non-transitory memory and executing instructions to perform steps including; receiving a first document, the first document containing data for extraction; outputting the document in a preferred document format; receiving a selection of a first portion of the document; recognizing a type of data contained in the selection, wherein the type of data includes at least one type of text and at least one type of image; processing the first portion to extract a first layer of data and a second layer of data, wherein the first layer of data includes a plurality of text entries, and the second layer of data includes an image; generating, based on the step of processing, a plurality of coordinate sets for the plurality of text entries; extracting, based on the plurality of coordinate sets, the plurality of text entries from the first layer; extracting the image from the second layer, wherein each of the one or more layers is extracted separately based on the at least one user preference; automatically generating, and storing in the non-transitory memory, a structured data set that includes the extracted data, the extracted data being structured in the structured data set based on coordinates of the extracted data; automatically generating an extraction rule based on the generated plurality of coordinate sets and the structured data set, the extraction rule executable by the at least one processor to extract text entries of the at least one type of text from a second document received after the first document; receiving the second document from a computer system; automatically matching the second document based on the step of automatically generating the extraction rule; and extracting, based on the step of matching, data from the second document by executing the generated extraction rule, including based at least on a portion of the generated plurality of coordinate sets. - View Dependent Claims (15, 20)
-
-
16. A system for document processing and data extraction using coordinate selection data over a network, the system comprising:
-
non-transitory memory; and at least one computer processor accessing the non-transitory memory and executing instructions to perform steps including; receiving a first document, the first document containing data for extraction; receiving a selection of a first portion of the document; recognizing a type of data contained in the selection, wherein the type of data includes at least one type of text and at least one type of image; processing the first portion to extract a first layer of data and a second layer of data, wherein the first layer of data includes a plurality of text entries, and the second layer of data includes an image; generating, based on the step of processing, a plurality of coordinate sets for the plurality of text entries; extracting, based on the plurality of coordinate sets, the plurality of text entries from the first layer; extracting the image from the second layer, wherein each of the one or more layers is extracted separately based on the at least one user preference; determining, based on the plurality of coordinate sets and type of text of the plurality of text entries, that at least a portion of the plurality of text entries are associated with a structured data set; automatically generating a structured data set including at least a portion of the plurality of text entries extracted from the document, the at least a portion of the plurality of text entries being structured in the structured data set based on coordinates of the at least a portion of the plurality of text entries; storing the structured data set in a file; and automatically generating an extraction rule based on the generated plurality of coordinate sets and the structured data set, the extraction rule executable by the at least one processor to extract text entries of the at least one type of text from a second document received after the first document; receiving the second document from a computer system; automatically matching the second document based on the step of automatically generating the extraction rule; and extracting, based on the step of matching, data from the second document by executing the generated extraction rule, including based at least on a portion of the generated plurality of coordinate sets. - View Dependent Claims (17, 18)
-
Specification