Data capture from multi-page documents
First Claim
1. A method for creating a flexible structure description for a multi-page document, the method comprising:
- acquiring electronic images of pages of at least one multi-page sample document;
determining a minimum number of pages to be used in creating the flexible structure description from the electronic images of pages of the at least one multi-page sample document;
determining a maximum number of pages to be used in creating the flexible structure description from the electronic images of pages of the at least one multi-page sample document;
identifying a first page for the flexible structure description from the multi-page sample document;
identifying a last page for the flexible structure description from the multi-page sample document;
from the at least one multi-page sample document, determining a range of pages in which an element of the multi-page document may be detected; and
storing information related to at least one of the minimum number of pages, the maximum number of pages, the first page identification, the last page identification, and the range of pages for one or more elements for the flexible structure description.
6 Assignments
0 Petitions
Accused Products
Abstract
A method for processing a batch of scanned images is disclosed. The method includes processing the scanned images into documents. For documents of multiple pages, the method maintains a page-based coordinate system to specify a location of structures within a page and joins the pages to form a multi-page sheet associated with a sheet-based coordinate system to specify a location of structures within the multi-page sheet. Data may be extracted from each document through a page mode wherein structures are detected on individual pages using the page-based coordinate system and a document mode wherein structures are detected within the entire document using the sheet-based coordinate system.
-
Citations
19 Claims
-
1. A method for creating a flexible structure description for a multi-page document, the method comprising:
-
acquiring electronic images of pages of at least one multi-page sample document; determining a minimum number of pages to be used in creating the flexible structure description from the electronic images of pages of the at least one multi-page sample document; determining a maximum number of pages to be used in creating the flexible structure description from the electronic images of pages of the at least one multi-page sample document; identifying a first page for the flexible structure description from the multi-page sample document; identifying a last page for the flexible structure description from the multi-page sample document; from the at least one multi-page sample document, determining a range of pages in which an element of the multi-page document may be detected; and storing information related to at least one of the minimum number of pages, the maximum number of pages, the first page identification, the last page identification, and the range of pages for one or more elements for the flexible structure description. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A data capture system, comprising:
-
an optical sensor; a processor; and a memory configured with processor-executable instructions which, when executed by the processor, implement a method for creating a flexible structure description for a multi-page document, the method comprising; acquiring electronic images of pages of a multi-page sample document; identifying a first page for the flexible structure description from the multi-page sample document; identifying a last page for the flexible structure description from the multi-page sample document; from the multi-page sample document, determining a range of pages in which an element of the multi-page document may be detected in images of pages of other multi-page documents; and electronically persisting in the memory the first page, the last page, and the range of pages for the element for the flexible structure description. - View Dependent Claims (10, 11, 12, 13, 14, 15, 18, 19)
-
-
16. A non-transitory computer-readable medium having stored thereon instructions, which when executed by a processing system, cause the processing system to implement steps to create a flexible structure description for a multi-page document, the steps comprising:
-
acquiring electronic images of pages of a multi-page sample document; identifying a first page for the flexible structure description from the multi-page sample document; identifying a last page for the flexible structure description from the multi-page sample document; from the multi-page sample document, determining a range of pages in which an element of the multi-page document may be detected; and electronically persisting in a storage an indication of the first page, the last page, and the range of pages for the element for the flexible structure description. - View Dependent Claims (17)
-
Specification