PERFORMING OPTICAL CHARACTER RECOGNITION USING SPATIAL INFORMATION OF REGIONS WITHIN A STRUCTURED DOCUMENT

US 20180032842A1
Filed: 07/26/2016
Published: 02/01/2018
Est. Priority Date: 07/26/2016
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for identifying information in an electronic document, comprising:

obtaining a set of training documents for each template of a plurality of templates for the electronic document;

extracting spatial attributes for at least a first label region and at least a first corresponding value region from the set, the spatial attributes representing a position of at least the first label region and at least the first value region within the electronic document; and

training a classifier model based on the extracted spatial attributes, wherein the classifier model is used to identify the information in the electronic document.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques are disclosed for facilitating optical character recognition (OCR) by identifying one or more regions in an electronic document to perform the OCR. For example a method for identifying information in an electronic document includes obtaining a set of training documents for each template of a plurality of templates for the electronic document, extracting spatial attributes for at least a first label region and at least a first corresponding value region from the set, and training a classifier model based on the extracted spatial attributes, wherein the classifier model is used to identify the information in the electronic document. The spatial attributes represent a position of at least the first label region and at least the first value region within the electronic document.

Citations

9 Claims

1. A computer-implemented method for identifying information in an electronic document, comprising:
- obtaining a set of training documents for each template of a plurality of templates for the electronic document;
  
  extracting spatial attributes for at least a first label region and at least a first corresponding value region from the set, the spatial attributes representing a position of at least the first label region and at least the first value region within the electronic document; and
  
  training a classifier model based on the extracted spatial attributes, wherein the classifier model is used to identify the information in the electronic document.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the information comprises at least one label and at least one corresponding value in the electronic document.
  - 3. The method of claim 1, further comprising:
    - segmenting an image of the electronic document to obtain spatial attributes of a plurality of regions in the electronic document, the regions including at least a first label and at least a first corresponding value; and
      
      identifying at least the first label based on the trained classifier model.
  - 4. The method of claim 3, further comprising:
    - designating one or more of the regions that are not identified as labels, as value regions; and
      
      performing Optical Character Recognition in the one or more value regions to obtain at least one value corresponding to the identified at least one label.
  - 5. The method of claim 4, further comprising identifying the designated one or more value regions as corresponding to the identified at least one label based on the classifier model.
  - 6. The method of claim 5, wherein the designated one or more value regions are identified as corresponding to the identified at least one label based at least on a position of the designated one or more value regions relative to the at least one label.
  - 7. The method of claim 3, further comprising obtaining the image of the electronic document by capturing the image using a camera device of a mobile device.
  - 8. The method of claim 1, wherein the spatial attributes comprises at least one of dimensions of each of the at least one label region and the at least one corresponding value region, position of each of the at least one label region and the at least one corresponding value region within the electronic document, or position of the at least one label region and the at least one corresponding value region relative to other regions in the electronic document.
  - 9. The method of claim 1, wherein the electronic document comprises a semi-structured document.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intuit, Inc.
Original Assignee
Intuit, Inc.
Inventors
YELLAPRAGADA, Vijay, CHIANG, Peijun, MADDIKA, Sreeneel K.

Granted Patent

US 10,013,643 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 18/214   Generating training pattern...

G06F 18/24   Classification techniques

G06Q 40/123   Tax preparation or submission

G06T 2207/20081   Training; Learning

G06T 2207/30176   Document

G06T 7/11   Region-based segmentation

G06V 30/10   Character recognition

G06V 30/19147   Obtaining sets of training ...

G06V 30/412   Layout analysis of document...

G06V 30/414   Extracting the geometrical ...

PERFORMING OPTICAL CHARACTER RECOGNITION USING SPATIAL INFORMATION OF REGIONS WITHIN A STRUCTURED DOCUMENT

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

9 Claims

Specification

Solutions

Use Cases

Quick Links

PERFORMING OPTICAL CHARACTER RECOGNITION USING SPATIAL INFORMATION OF REGIONS WITHIN A STRUCTURED DOCUMENT

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

9 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links