Method and system for detecting and recognizing text in images

US 8,335,402 B1
Filed: 08/03/2011
Issued: 12/18/2012
Est. Priority Date: 01/23/2008
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for detecting and recognizing text in an image, the method comprising:

under the control of one or more computer systems configured with executable instructions,obtaining an output image that differs from an input image in at least one visual aspect, the output image comprising one or more text regions;

separately processing the input image to create at least one binary chip, each binary chip corresponding to a text region of the output image;

generating first output by at least recognizing the text in each binary chip from the text region corresponding to the binary chip using an optical character recognizer;

generating second output by at least separately and independently recognizing the text from the one or more text regions of the output image using the optical character recognizer; and

analyzing at least the generated first output and the generated second output to form consensus output.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Various embodiments of the present invention relate to a method, system and computer program product for detecting and recognizing text in the images captured by cameras and scanners. First, a series of image-processing techniques is applied to detect text regions in the image. Subsequently, the detected text regions pass through different processing stages that reduce blurring and the negative effects of variable lighting. This results in the creation of multiple images that are versions of the same text region. Some of these multiple versions are sent to a character-recognition system. The resulting texts from each of the versions of the image sent to the character-recognition system are then combined to a single result, wherein the single result is detected text.

Citations

20 Claims

1. A computer-implemented method for detecting and recognizing text in an image, the method comprising:
- under the control of one or more computer systems configured with executable instructions,obtaining an output image that differs from an input image in at least one visual aspect, the output image comprising one or more text regions;
  
  separately processing the input image to create at least one binary chip, each binary chip corresponding to a text region of the output image;
  
  generating first output by at least recognizing the text in each binary chip from the text region corresponding to the binary chip using an optical character recognizer;
  
  generating second output by at least separately and independently recognizing the text from the one or more text regions of the output image using the optical character recognizer; and
  
  analyzing at least the generated first output and the generated second output to form consensus output.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The computer-implemented method of claim 1, wherein analyzing at least the generated first output and the generated second output comprises using a majority vote process to select portions from the generated first output and the generated second output.
  - 3. The computer-implemented method of claim 1, wherein analyzing at least the generated first output and the generated second output comprises taking a logical OR of the generated first output and generated second output.
  - 4. The computer-implemented method of claim 1, wherein processing the input image includes:
    - detecting the one or more text regions by at least filtering and segmenting the input image and intersecting the filtered and segmented input image with a mask created from a plurality of bounding boxes, each bounding box enclosing a connected component, each connected component including a plurality of pixels comprising the image and connected on the basis of a predetermined pixel intensity and predefined distance between the pixels.
  - 5. The computer-implemented method of claim 1, wherein the at least one visual aspect includes blurriness.
  - 6. The computer-implemented method of claim 1, wherein the at least one visual aspect includes at least one lighting effect.
  - 7. The computer-implemented method of claim 1, wherein the generated first output and the generated second output both comprise text and wherein the consensus output includes at least some text selected from the generated first output and other text selected from the generated second output.

8. A system for detecting and recognizing text in an image, the system comprising:
- one or more processors; and
  
  memory, including instructions that, when collectively executed by the one or more processors, cause the system to at least;
  
  generate first output by using an optical character recognizer to recognize the text in at least one binary chip formed at least in part by processing an input image;
  
  generate second output by using the optical character recognizer to separately and independently recognize the text in an output image formed at least in part by processing the input image; and
  
  analyze at least the generated first output and the generated second output to form consensus output.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the input image includes one or more detected text regions and wherein each of the at least one binary chip includes a corresponding detected text region.
  - 10. The system of claim 8, further comprising:
    - forming a gray-level chip from the input image; and
      
      processing the formed gray-level chip to form the binary chip.
  - 11. The system of claim 8, wherein analyzing the generated first output and the generated second output comprises using a majority vote process to select portions from the generated first output and the generated second output.
  - 12. The system of claim 8, wherein the input image differs from the output image in at least one visual aspect.
  - 13. The system of claim 8, wherein the generated first output and the generated second output both comprise text and wherein the consensus output includes at least some text selected from the generated first output and other text selected form the generated second output.
  - 14. The system of claim 8, wherein processing the input image includes reducing one or more blurring or light variation effects from the image.

15. A non-transitory computer program product for use with a computer, the computer program product comprising a computer usable medium having computer-readable program code embodied therein for detecting and recognizing text in an image, the computer program product performing:
- obtain an output image formed at least in part by processing an input image, the output region comprising at least one detected text region;
  
  obtain at least one binary chip formed at least in part by separately processing the input image;
  
  obtain first output by at least recognizing the text in each of the at least one binary chip;
  
  obtain second output by at least separately and independently recognizing the text from the text regions of the output image; and
  
  analyze at least the generated first output and the generated second output to form consensus output.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The non-transitory computer program product of claim 15, wherein each of the at least one binary chip includes a corresponding detected text region.
  - 17. The non-transitory computer program product of claim 15, wherein recognizing the text in each of the at least one binary chip includes using an optical text recognizer on the binary chip.
  - 18. The non-transitory computer program product of claim 17, wherein recognizing the text from the text regions includes using the optical text recognizer on the text regions.
  - 19. The non-transitory computer program product of claim 15, wherein the generated first output and the generated second output both comprise text and wherein the combined output includes at least some text selected from the generated first output and other text selected from the generated second output.
  - 20. The non-transitory computer program product of claim 15, wherein analyzing the generated first output and the generated second output comprises taking a logical OR of the generated first output and generated second output.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
A9.com Incorporated (Amazon.com, Inc.)
Original Assignee
A9.com Incorporated (Amazon.com, Inc.)
Inventors
Manmatha, Raghavan, Ruzon, Mark A
Primary Examiner(s)
Entezari, Michelle

Application Number

US13/197,591
Time in Patent Office

503 Days
Field of Search

382/176
US Class Current

382/283
CPC Class Codes

G06F 18/25   Fusion techniques

G06V 20/62   Text, e.g. of license plate...

G06V 30/10   Character recognition

G06V 30/15   Cutting or merging image el...

G06V 30/153   using recognition of charac...

G06V 30/155   Removing patterns interferi...

G06V 30/162   Quantising the image signal

G06V 30/164   Noise filtering

G06V 30/224   of printed characters havin...

Method and system for detecting and recognizing text in images

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for detecting and recognizing text in images

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links