×

Performing optical character recognition using spatial information of regions within a structured document

  • US 10,013,643 B2
  • Filed: 07/26/2016
  • Issued: 07/03/2018
  • Est. Priority Date: 07/26/2016
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for identifying information in an electronic document, comprising:

  • obtaining a set of training documents for each template of a plurality of templates for the electronic document;

    extracting a first set of spatial attributes for at least a first label region and at least a first corresponding value region from the set, the first set of spatial attributes representing a position of at least the first label region and at least the first value region within the electronic document;

    training a classifier model based on the extracted first set of spatial attributes to generate a trained classifier model;

    segmenting, an image of the electronic document to obtain a second set of spatial attributes of candidate regions in the image, each of the candidate regions corresponding to a label or a value;

    identifying at least one candidate region from the candidate regions as a label to generate an identified label based on the obtained second set of spatial attributes using the trained classifier model without performing Optical Character Recognition (OCR);

    designating at least one of the candidate regions that is not identified as a label, as a designated value region; and

    performing OCR only on the designated value region to obtain at least one value corresponding to the identified label.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×