×

Data capture from multi-page documents

  • US 8,547,589 B2
  • Filed: 05/21/2009
  • Issued: 10/01/2013
  • Est. Priority Date: 09/08/2008
  • Status: Active Grant
First Claim
Patent Images

1. A method for processing a batch of document images, the method comprising:

  • processing, by a computing device, the document images into one or more documents, wherein a document of the one or more documents includes multiple pages;

    maintaining, by the computing device, a page-based coordinate system to specify a location of structures within individual pages of the document;

    combining, by the computing device, the multiple pages to form a multi-page sheet, wherein a sheet-based coordinate system specifies a location of structures within the multi-page sheet; and

    performing, by the computing device, a data extraction operation to extract data from the document, said data extraction operation including;

    detecting the structures on individual pages using the page-based coordinate system;

    defining a repeating group of fields, wherein the repeating group of fields is capable of flowing over from one page onto another page;

    detecting whether all fields of an instance of the repeating group of fields are found on consecutive pages; and

    depending on whether all fields of the instance of the repeating group of fields are found on consecutive pages, detecting structures using the sheet-based coordinate system, detecting structures within the document using the sheet-based coordinate system.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×