OPTICAL CHARACTER RECOGNITION

US 20160342852A1
Filed: 01/31/2014
Published: 11/24/2016
Est. Priority Date: 01/31/2014
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, at a computing system, a plurality of optical character recognition (OCR) outputs provided by a respective plurality of OCR engines, each of the plurality of OCR outputs being representative of text depicted in a portion of an electronic image;

identifying, using the computing system, a document context associated with the electronic image;

generating, using the computing system, an output character set by applying a character resolution model to resolve differences among the plurality of OCR outputs, the character resolution model defining a probability of character recognition accuracy for each of the plurality of OCR engines given the identified document context; and

updating, using the computing system, the character resolution model to generate an updated character resolution model such that subsequent generating of output character sets are based on the updated character resolution model.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Optical character recognition is described in various implementations. In one example implementation, a method may include receiving a plurality of optical character recognition (OCR) outputs provided by a respective plurality of OCR engines, each of the plurality of OCR outputs being representative of text depicted in a portion of an electronic image. The method may also include identifying a document context associated with the electronic image, and generating an output character set by applying a character resolution model to resolve differences among the plurality of OCR outputs. The character resolution model may define a probability of character recognition accuracy for each of the plurality of OCR engines given the identified document context. The method may also include updating the character resolution model to generate an updated character resolution model such that subsequent generating of output character sets are based on the updated character resolution model.

7 Citations

View as Search Results

15 Claims

1. A method comprising:
- receiving, at a computing system, a plurality of optical character recognition (OCR) outputs provided by a respective plurality of OCR engines, each of the plurality of OCR outputs being representative of text depicted in a portion of an electronic image;
  
  identifying, using the computing system, a document context associated with the electronic image;
  
  generating, using the computing system, an output character set by applying a character resolution model to resolve differences among the plurality of OCR outputs, the character resolution model defining a probability of character recognition accuracy for each of the plurality of OCR engines given the identified document context; and
  
  updating, using the computing system, the character resolution model to generate an updated character resolution model such that subsequent generating of output character sets are based on the updated character resolution model.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the character resolution model comprises a Bayesian prior probability distribution.
  - 3. The method of claim 2, wherein the updated character resolution model comprises a Bayesian posterior probability distribution.
  - 4. The method of claim 1, wherein the document context associated with the electronic image comprises an image attribute associated with the image.
  - 5. The method of claim 1, wherein the document context associated with the electronic image comprises a textual attribute associated with the text depicted in the image.
  - 6. The method of claim 1, wherein the document context associated with the electronic image comprises a content attribute associated with content depicted in the image.
  - 7. The method of claim 1, wherein the document context associated with the electronic image comprises an image attribute associated with the image, a textual attribute associated with the text depicted in the image, and a content attribute associated with content depicted in the image.

8. A system comprising:
- a processor resource;
  
  a document analysis module, executable on the processor resource, to identify a document context associated with an electronic image;
  
  a conflict resolution module, executable on the processor resource, to receive a plurality of optical character recognition (OCR) outputs provided by a respective plurality of OCR engines, each of the plurality of OCR outputs being representative of text depicted in a portion of the electronic image, and to generate an output document based on a character resolution model and the plurality of OCR outputs, the character resolution model defining a probability of character recognition accuracy for each of the plurality of OCR engines given the identified document context; and
  
  a model updater module, executable on the processor resource, to generate an updated character resolution model for subsequent use by the conflict resolution module.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the character resolution model comprises a Bayesian prior probability distribution.
  - 10. The system of claim 9, wherein the updated character resolution model comprises a Bayesian posterior probability distribution.
  - 11. The system of claim 8, wherein the document context associated with the electronic image comprises an image attribute associated with the image.
  - 12. The system of claim 8, wherein the document context associated with the electronic image comprises a textual attribute associated with the text depicted in the image.
  - 13. The system of claim 8, wherein the document context associated with the electronic image comprises a content attribute associated with content depicted in the image.
  - 14. The system of claim 8, wherein the document context associated with the electronic image comprises an image attribute associated with the image, a textual attribute associated with the text depicted in the image, and a content attribute associated with content depicted in the image.

15. A computer-readable storage medium storing instructions that, when executed, cause a processor resource to;
- receive a plurality of optical character recognition (OCR) outputs provided by a respective plurality of OCR engines, each of the plurality of OCR outputs being representative of text depicted in a portion of an electronic image;
  
  identify a document context associated with the electronic image;
  
  generate an output character set by applying a character resolution model to resolve differences among the plurality of OCR outputs, the character resolution model defining a probability of character recognition accuracy for each of the plurality of OCR engines given the identified document context; and
  
  update the character resolution model to generate an updated character resolution model such that subsequent generating of output character sets are based on the updated character resolution model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Longsand Limited (Open Text Corporation)
Original Assignee
Longsand Limited (Open Text Corporation)
Inventors
Blanchflower, Sean

Granted Patent

US 10,176,392 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 18/25   Fusion techniques

G06F 18/254   of classification results, ...

G06F 18/29   Graphical models, e.g. Baye...

G06V 30/224   of printed characters havin...

OPTICAL CHARACTER RECOGNITION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

7 Citations

15 Claims

Specification

Solutions

Use Cases

Quick Links

OPTICAL CHARACTER RECOGNITION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

7 Citations

15 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links