Image-based character recognition

US 9,043,349 B1
Filed: 11/29/2012
Issued: 05/26/2015
Est. Priority Date: 11/29/2012
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method, comprising:

under control of one or more computer systems configured with executable instructions,obtaining an image captured by at least one camera of a computing device;

analyzing the image to locate a region of text in the image;

selecting a subset of the image associated with the region of text;

binarizing the subset of the image associated with the region of text to yield a binarized region;

communicating the binarized region to a first text recognition engine and at least one second text recognition engine;

recognizing the text of the binarized region with the first text recognition engine to yield first recognized text;

recognizing, independently, the text of the binarized region with the at least one second text recognition engine to yield second recognized text, the first text recognition engine being different relative to the at least one second text recognition engine;

determining a first confidence score for the first recognized text from the first text recognition engine and at least a second confidence score for the second recognized text from the at least one second text recognition engine; and

applying a linear combination function of the first confidence score and the second confidence score to generate a consensus string of text comprising at least a portion of at least one of the first recognized text from the first text recognition engine or the second recognized text from the at least one second text recognition engine.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Various embodiments enable a device to perform tasks such as processing an image to recognize and locate text in the image, and providing the recognized text an application executing on the device for performing a function (e.g., calling a number, opening an internet browser, etc.) associated with the recognized text. In at least one embodiment, processing the image includes substantially simultaneously or concurrently processing the image with at least two recognition engines, such as at least two optical character recognition (OCR) engines, running in a multithreaded mode. In at least one embodiment, the recognition engines can be tuned so that their respective processing speeds are roughly the same. Utilizing multiple recognition engines enables processing latency to be close to that of using only one recognition engine.

Citations

21 Claims

1. A computer-implemented method, comprising:
- under control of one or more computer systems configured with executable instructions,obtaining an image captured by at least one camera of a computing device;
  
  analyzing the image to locate a region of text in the image;
  
  selecting a subset of the image associated with the region of text;
  
  binarizing the subset of the image associated with the region of text to yield a binarized region;
  
  communicating the binarized region to a first text recognition engine and at least one second text recognition engine;
  
  recognizing the text of the binarized region with the first text recognition engine to yield first recognized text;
  
  recognizing, independently, the text of the binarized region with the at least one second text recognition engine to yield second recognized text, the first text recognition engine being different relative to the at least one second text recognition engine;
  
  determining a first confidence score for the first recognized text from the first text recognition engine and at least a second confidence score for the second recognized text from the at least one second text recognition engine; and
  
  applying a linear combination function of the first confidence score and the second confidence score to generate a consensus string of text comprising at least a portion of at least one of the first recognized text from the first text recognition engine or the second recognized text from the at least one second text recognition engine.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The computer-implemented method of claim 1, wherein the first text recognition engine and the at least one second text recognition engine have processing speeds that are equal to within an allowable deviation.
  - 3. The computer-implemented method of claim 1, further comprising:
    - based at least in part on the combination function of the first confidence score and the second confidence score being below a threshold, processing the binarized region with at least a third text recognition engine.
  - 4. The computer-implemented method of claim 1, wherein determining the first confidence score for the first recognized text from the first text recognition engine and at least the second confidence score for the second recognized text from the at least one second text recognition engine includes:
    - searching a database for matching words within the first recognized text and the second recognized text; and
      
      increasing;
      
      (a) the first confidence score for the first recognized text from the first text recognition engine based at least in part on a first string of characters in the first recognized text matching at least one first word in the database and (b) the second confidence for the second recognized text from the second text recognition engine based at least in part on a second string of characters in the second recognized text matching at least one of the first word or a second word in the database.
  - 5. The computer-implemented method of claim 1, further comprising:
    - applying a first bounding box to the binarized region;
      
      applying a second bounding box to the binarized region; and
      
      aligning the first bounding box of the first recognized text from the first text recognition engine with the second bounding box of the second recognized text from the at least a second text recognition engine.

6. A computer-implemented method, comprising:
- under control of one or more computer systems configured with executable instructions,analyzing an image to locate a region of text in the image;
  
  selecting a subset of the image associated with the region of text in the image;
  
  binarizing the subset of the image associated with the region of text in the image to yield a binarized region;
  
  communicating the binarized region to a first recognition engine and a second recognition engine;
  
  recognizing the text in the image with the first recognition engine to yield first recognized text;
  
  recognizing the text in the image with the second recognition engine to yield second recognized text, the first recognition engine being different relative to the second recognition engine;
  
  determining a first confidence score for the first recognized text from the first recognition engine and a second confidence score for the second recognized text from the second recognition engine; and
  
  applying a linear combination function of the first confidence score and the second confidence score to generate a consensus string of text comprising at least a portion of at least one of the first recognized text from the first recognition engine or the second recognized text from the second recognition engine.
- View Dependent Claims (7, 8, 9, 10, 11, 12)
- - 7. The computer-implemented method of claim 6, further comprising:
    - based at least in part on the first confidence score and the second confidence score, recognizing second text in a second image with the first recognition engine to yield third recognized text and the second recognition engine to yield fourth recognized text; and
      
      determining a third confidence score for the third recognized text from the first recognition engine and a fourth confidence score for the fourth recognized text from the second recognition engine; and
      
      based at least in part on a combination function of the first confidence score, the second confidence score, the third confidence score, and the fourth confidence score, generating the consensus string using at least a portion of at least one of the first recognized text from the first recognition engine, the second recognized text from the at least one second recognition engine, the third recognized text from the first recognition engine, or the fourth recognized text from the at least one second recognition engine.
  - 8. The computer-implemented method of claim 6, wherein the first recognition engine and the second recognition engine have processing speeds that are substantially equal to within an allowable deviation.
  - 9. The computer-implemented method of claim 6, further comprising:
    - based at least in part on the linear combination of the first confidence score and the second confidence score being below a threshold, recognizing the text with at least a third recognition engine.
  - 10. The computer-implemented method of claim 6, wherein determining the first confidence score for the first recognized text from the first recognition engine and the second confidence score for the second recognized text from the second recognition engine includes:
    - searching a database for matching words within the first recognized text and the second recognized text; and
      
      increasing;
      
      (a) the first confidence score for the first recognized text from the first recognition engine based at least in part on a first string of characters in the first recognized text matching at least one first word in the database and (b) the second confidence for the second recognized text from the second recognition engine based at least in part on a second string of characters in the second recognized text matching at least one of the first word or a second word in the database.
  - 11. The computer-implemented method of claim 6, further comprising:
    - applying a first bounding box to the binarized region;
      
      applying a second bounding box to the binarized region; and
      
      aligning the first bounding box of the first recognized text from the first recognition engine with the second bounding box of the second recognized text from the second recognition engine.
  - 12. The computer-implemented method of claim 6, wherein the image is captured by at least one camera of a portable computing device and the image is one of a plurality of images of the text captured in a continuous mode.

13. A computing device, comprising:
- a processor;
  
  a display screen; and
  
  memory including instructions that, when executed by the processor, cause the computing device to;
  
  analyze an image to locate a region of text in the image;
  
  select a subset of the image associated with the region of text in the image;
  
  binarize the subset of the image associated with the region of text in the image to yield a binarized region;
  
  communicate the binarized region to a first text recognition engine and a second text recognition engine;
  
  recognize the text in the image with a first text recognition engine to yield first recognized text and the second text recognition engine to yield second recognized text, the first text recognition engine being different relative to the second text recognition engine;
  
  determine a first confidence score for the first recognized text from the first text recognition engine and a second confidence score for the second recognized text from the second text recognition engine; and
  
  apply a linear combination function of the first confidence score and the second confidence score to generate a consensus string of text comprising at least a portion of at least one of the first recognized text from the first text recognition engine or the second recognized text from the second text recognition engine.
- View Dependent Claims (14, 15, 16)
- - 14. The computing device of claim 13, wherein the instructions that, when executed by the processor, further cause the computing device to:
    - apply at least one preprocessing technique to the image based at least in part on only a single image being available for recognizing the text; and
      
      recognize the text in each of the image and a second image based at least in part on at least two images being available for recognizing the text.
  - 15. The computing device of claim 13, wherein the first text recognition engine and the second text recognition engine have processing speeds that are substantially equal to within an allowable deviation.
  - 16. The computing device of claim 13, wherein the instructions that, when executed by the processor, further cause the computing device to:
    - based at least in part on the linear combination of the first confidence and the second confidence score being below a threshold, recognize the text with at least a third text recognition engine.

17. A non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor of a computing device, cause the computing device to:
- analyze an image to locate a region of text in the image;
  
  select a subset of the image associated with the region of text in the image;
  
  binarize the subset of the image associated with the region of text in the image to yield a binarized region;
  
  communicate the binarized region to a first recognition engine and a second recognition engine;
  
  recognize the text in the image with a first text recognition engine to yield first recognized text and a second text recognition engine to yield second recognized text, the first text recognition engine being different relative to the second text recognition engine;
  
  determine a first confidence score for the first recognized text from the first text recognition engine and a second confidence score for the second recognized text from the second text recognition engine; and
  
  apply a linear combination function of the first confidence score and the second confidence score to generate a consensus string of text comprising at least a portion of at least one of the first recognized text from the first text recognition engine or the second recognized text from the second text recognition engine.
- View Dependent Claims (18, 19, 20, 21)
- - 18. The non-transitory computer-readable storage medium of claim 17, wherein the instructions that cause the computing device to determine the first confidence score for the first recognized text from the first text recognition engine and the second confidence score for the second recognized text from the second text recognition engine includes causing the computing device to:
    - search a database for matching words within the first recognized text and the second recognized text; and
      
      increasing;
      
      (a) the first confidence score for the first recognized text from the first recognition engine based at least in part on a first string of characters in the first recognized text matching at least one first word in the database and (b) the second confidence for the second recognized text from the second text recognition engine based at least in part on a second string of characters in the second recognized text matching at least one of the first word or a second word in the database.
  - 19. The non-transitory computer-readable storage medium of claim 17, wherein the instructions, when executed by the at least one processor, further cause the computing device to:
    - based at least in part on the linear combination of the first confidence score and the second confidence score being below a threshold, recognize the text with at least a third text recognition engine.
  - 20. The non-transitory computer-readable storage medium of claim 17, wherein the first text recognition engine and the second text recognition engine have processing speeds that are substantially equal to within an allowable deviation.
  - 21. The non-transitory computer-readable storage medium of claim 17, wherein the instructions, when executed by the at least one processor, further cause the computing device to:
    - based at least in part on the first confidence score and the second confidence score, recognize second text in a second image with the first text recognition engine to yield third recognized text and the second text recognition engine to yield fourth recognized text; and
      
      determine a third confidence score for the third recognized text from the first text recognition engine and a fourth confidence score for the fourth recognized text from the second text recognition engine; and
      
      based at least in part on a combination function of the first confidence score, the second confidence score, the third confidence score, and the fourth confidence score, generate the consensus string using at least a portion of at least one of the first recognized text from the first text recognition engine, the second recognized text from the at least one second text recognition engine, the third recognized text from the first text recognition engine, or the fourth recognized text from the at least one second text recognition engine.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
A9.com Incorporated (Amazon.com, Inc.)
Original Assignee
A9.com Incorporated (Amazon.com, Inc.)
Inventors
Lin, Xiaofan, Dhua, Arnab Sanat Kumar, Gray, Douglas Ryan, Lou, Yu
Primary Examiner(s)
Ly, Anh

Application Number

US13/688,772
Time in Patent Office

908 Days
Field of Search

707/758, 707/728, 707/723, 707/E17.014, 345/419, 345/582, 345/619, 345/633, 345/589, 705/14.54, 705/26.61, 715/256, 715/273, 715/243, 715/762, 715/234, 704/236, 704/E15.043, 704/E13.011, 704/9, 704/235, 704/246, 704/256, 704/260, 382/229, 382/309, 382/217, 382/138, 382/107, 382/100, 382/305, 382/203, 382/182, 382/187, 382/154, 382/257, 382/276, 382/181, 382/176, 382/168
US Class Current

707/758
CPC Class Codes

G06F 16/5846   using extracted text

G06F 18/254   of classification results, ...

G06V 10/809   of classification results, ...

G06V 20/62   Text, e.g. of license plate...

G06V 30/10   Character recognition

G06V 30/224   of printed characters havin...

Image-based character recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Image-based character recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links