Data capture from multi-page documents
First Claim
1. A method for processing a batch of document images, the method comprising:
- processing, by a computing device, the document images into one or more documents, wherein a document of the one or more documents includes multiple pages;
maintaining, by the computing device, a page-based coordinate system to specify a location of structures within individual pages of the document;
combining, by the computing device, the multiple pages to form a multi-page sheet, wherein a sheet-based coordinate system specifies a location of structures within the multi-page sheet; and
performing, by the computing device, a data extraction operation to extract data from the document, said data extraction operation including;
detecting the structures on individual pages using the page-based coordinate system;
defining a repeating group of fields, wherein the repeating group of fields is capable of flowing over from one page onto another page;
detecting whether all fields of an instance of the repeating group of fields are found on consecutive pages; and
depending on whether all fields of the instance of the repeating group of fields are found on consecutive pages, detecting structures using the sheet-based coordinate system, detecting structures within the document using the sheet-based coordinate system.
5 Assignments
0 Petitions
Accused Products
Abstract
A method for processing a batch of scanned images is provided. The method comprises processing the scanned images into documents; for documents comprising multiple pages maintaining a page-based coordinate system to specify a location of structures within a page and joining the pages to form a multi-page sheet having a sheet-based coordinate system to specify a location of structures within the multi-page sheet; performing a data extraction operation to extract data from each document, said data extraction operation comprising a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.
48 Citations
25 Claims
-
1. A method for processing a batch of document images, the method comprising:
processing, by a computing device, the document images into one or more documents, wherein a document of the one or more documents includes multiple pages; maintaining, by the computing device, a page-based coordinate system to specify a location of structures within individual pages of the document; combining, by the computing device, the multiple pages to form a multi-page sheet, wherein a sheet-based coordinate system specifies a location of structures within the multi-page sheet; and performing, by the computing device, a data extraction operation to extract data from the document, said data extraction operation including; detecting the structures on individual pages using the page-based coordinate system; defining a repeating group of fields, wherein the repeating group of fields is capable of flowing over from one page onto another page; detecting whether all fields of an instance of the repeating group of fields are found on consecutive pages; and depending on whether all fields of the instance of the repeating group of fields are found on consecutive pages, detecting structures using the sheet-based coordinate system, detecting structures within the document using the sheet-based coordinate system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
10. A non-transitory computer-readable medium embodying a set of instructions which, when executed by a computer, cause the computer to:
-
process document images into one or more documents, wherein a document of the one or more documents includes multiple pages; maintain a page-based coordinate system to specify a location of structures within individual pages of the document; combine the multiple pages to form a multi-page sheet, wherein a sheet-based coordinate system specifies a location of structures within the multi-page sheet; and perform, by the computing device, a data extraction operation to extract data from the document, said data extraction operation including; detecting the structures on the individual pages using the page-based coordinate system; defining a repeating group of fields, wherein the repeating group of fields is capable of flowing over from one page onto another page; detecting whether all fields of an instance of the repeating group of fields are found on consecutive pages of the document; and depending on whether all fields of the instance of the repeating group of fields are found on consecutive pages, detecting structures within the document using the sheet-based coordinate system. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
-
-
18. A system for capturing data from a document image, the system comprising:
-
an imaging component capable of capturing the document image of a document; a processor; and a memory coupled to the processor and in electronic communication with the imaging component, the memory configured with instructions for causing the processor to; process document images into one or more documents, wherein a document of the one or more documents includes multiple pages; maintain a page-based coordinate system to specify a location of structures within individual pages of the document; combine the multiple pages to form a multi-page sheet, wherein a sheet- based coordinate system specifies a location of structures within the multi-page sheet; and perform a data extraction operation to extract data from the document, said data extraction operation including; detecting the structures on the individual pages using the page- based coordinate system; defining a repeating group of fields, wherein the repeating group of fields is capable of flowing over from one page onto another page; detecting whether all fields of an instance of the repeating group of fields are found on consecutive pages of the document; and depending on whether all fields of the instance of the repeating group of fields are found on consecutive pages, detecting structures within the document using the sheet-based coordinate system. - View Dependent Claims (19, 20, 21, 22, 23, 24, 25)
-
Specification