PERFORMING OPTICAL CHARACTER RECOGNITION USING SPATIAL INFORMATION OF REGIONS WITHIN A STRUCTURED DOCUMENT
First Claim
1. A computer-implemented method for identifying information in an electronic document, comprising:
- obtaining a set of training documents for each template of a plurality of templates for the electronic document;
extracting spatial attributes for at least a first label region and at least a first corresponding value region from the set, the spatial attributes representing a position of at least the first label region and at least the first value region within the electronic document; and
training a classifier model based on the extracted spatial attributes, wherein the classifier model is used to identify the information in the electronic document.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques are disclosed for facilitating optical character recognition (OCR) by identifying one or more regions in an electronic document to perform the OCR. For example a method for identifying information in an electronic document includes obtaining a set of training documents for each template of a plurality of templates for the electronic document, extracting spatial attributes for at least a first label region and at least a first corresponding value region from the set, and training a classifier model based on the extracted spatial attributes, wherein the classifier model is used to identify the information in the electronic document. The spatial attributes represent a position of at least the first label region and at least the first value region within the electronic document.
-
Citations
9 Claims
-
1. A computer-implemented method for identifying information in an electronic document, comprising:
-
obtaining a set of training documents for each template of a plurality of templates for the electronic document; extracting spatial attributes for at least a first label region and at least a first corresponding value region from the set, the spatial attributes representing a position of at least the first label region and at least the first value region within the electronic document; and training a classifier model based on the extracted spatial attributes, wherein the classifier model is used to identify the information in the electronic document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
Specification