Text Image Quality Based Feedback For Improving OCR

US 20140168478A1
Filed: 03/15/2013
Published: 06/19/2014
Est. Priority Date: 12/13/2012
Status: Active Grant

First Claim

Patent Images

1. A method to improve text recognition by using multiple images of identical text, the method comprising:

capturing a plurality of images of a scene of real world at a plurality of zoom levels, said scene of real world containing text of one or more sizes;

extracting from each of the plurality of images, one or more text regions;

analyzing an attribute that is relevant to OCR in one or more versions of a first text region as extracted from one or more of said plurality of images; and

when the attribute has a value that meets a limit of optical character recognition (OCR) in a version of the first text region, providing the version of the first text region as input to OCR.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An electronic device and method capture multiple images of a scene of real world at a several zoom levels, the scene of real world containing text of one or more sizes. Then the electronic device and method extract from each of the multiple images, one or more text regions, followed by analyzing an attribute that is relevant to OCR in one or more versions of a first text region as extracted from one or more of the multiple images. When an attribute has a value that meets a limit of optical character recognition (OCR) in a version of the first text region, the version of the first text region is provided as input to OCR.

Citations

28 Claims

1. A method to improve text recognition by using multiple images of identical text, the method comprising:
- capturing a plurality of images of a scene of real world at a plurality of zoom levels, said scene of real world containing text of one or more sizes;
  
  extracting from each of the plurality of images, one or more text regions;
  
  analyzing an attribute that is relevant to OCR in one or more versions of a first text region as extracted from one or more of said plurality of images; and
  
  when the attribute has a value that meets a limit of optical character recognition (OCR) in a version of the first text region, providing the version of the first text region as input to OCR.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1 wherein:
    - the attribute comprises height of each region in the one or more text regions.
  - 3. The method of claim 1 wherein:
    - the extracting comprises checking for presence of a line of pixels of a common binary value in the one or more text regions.
  - 4. The method of claim 1 wherein:
    - the extracting comprises checking a variance in width of a stroke of a character in the one or more text regions.
  - 5. The method of claim 1 further comprising:
    - checking if an extreme x-coordinate of the first text region is greater than w/zoom_level, wherein w is a width of the first text region and zoom_level is a level of zoom at which an image comprising the first text region is captured by a camera.
  - 6. The method of claim 1 further comprising:
    - checking if an extreme y-coordinate of the first text region is greater than h/zoom_level, wherein h is a height of the first text region and zoom_level is a level of zoom at which an image comprising the first text region is captured by a camera.
  - 7. The method of claim 1 wherein:
    - the plurality of images are captured in a sequence successively one after another.
  - 8. The method of claim 7 wherein:
    - the plurality of images are captured prior to said extracting.
  - 9. The method of claim 7 wherein:
    - said plurality of images are automatically captured in response to a single user input.
  - 10. The method of claim 1 wherein:
    - a feature in the scene of real world not captured in an image comprising an enlarged version of the first text region is captured in another image comprising a smaller version of the first text region.
  - 11. The method of claim 1 further comprising:
    - when the attribute has a value that does not meets a limit of optical character recognition (OCR) in a version of the first text region, automatically analyzing additional versions of the first text region as extracted from said one or more plurality of images.
  - 12. The method of claim 1 further comprising:
    - analyzing an attribute that is relevant to OCR in one or more versions of a second text region as extracted from one or more of said plurality of images; and
      
      when the attribute has a value that meets a limit of optical character recognition (OCR) in a version of the second text region, providing the version of the second text region as input to OCR.
  - 13. The method of claim 12 further comprising:
    - outputting text recognized in said first and second regions.

14. At least one non-transitory computer readable storage media comprising a plurality of instructions to be executed by at least one processor to correct skew in an image of a scene of real world, the plurality of instructions comprising:
- first instructions to capture a plurality of images of a scene of real world at a plurality of zoom levels, said scene of real world containing text of one or more sizes;
  
  second instructions to extract from each of the plurality of images, one or more text regions;
  
  third instructions to analyze an attribute that is relevant to OCR in one or more versions of a first text region as extracted from one or more of said plurality of images; and
  
  fourth instructions to provide the version of the first text region as input to OCR, when the attribute has a value that meets a limit of optical character recognition (OCR) in a version of the first text region.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21)
- - 15. The at least one non-transitory computer readable storage media of claim 14 wherein:
    - the attribute comprises height of each region in the one or more text regions.
  - 16. The at least one non-transitory computer readable storage media of claim 14 wherein:
    - the second instructions comprise instructions to check for presence of a line of pixels of a common binary value in the one or more text regions.
  - 17. The at least one non-transitory computer readable storage media of claim 14 wherein:
    - the second instructions comprise instructions to check a variance in width of a stroke of a character in the one or more text regions.
  - 18. The at least one non-transitory computer readable storage media of claim 14 further comprising:
    - fifth instructions to check if an extreme x-coordinate of the first text region is greater than w/zoom_level, wherein w is a width of the first text region and zoom_level is a level of zoom at which an image comprising the first text region is captured by a camera.
  - 19. The at least one non-transitory computer readable storage media of claim 14 further comprising:
    - fifth instructions to check if an extreme y-coordinate of the first text region is greater than h/zoom_level, wherein h is a height of the first text region and zoom_level is a level of zoom at which an image comprising the first text region is captured by a camera.
  - 20. The at least one non-transitory computer readable storage media of claim 14 wherein:
    - the plurality of images are captured in a sequence successively one after another.
  - 21. The at least one non-transitory computer readable storage media of claim 14 wherein:
    - a feature in the scene of real world not captured in an image comprising an enlarged version of the first text region is captured in another image comprising a smaller version of the first text region.

22. A mobile device to decode text in real world images, the mobile device comprising:
- a camera;
  
  a memory operatively connected to the camera to receive at least an image therefrom, the image comprising one or more text regions;
  
  at least one processor operatively connected to the memory to execute a plurality of instructions stored in the memory;
  
  wherein the plurality of instructions cause the at least one processor to;
  
  capture a plurality of images of a scene of real world at a plurality of zoom levels, said scene of real world containing text of one or more sizes;
  
  extract from each of the plurality of images, one or more text regions;
  
  analyze an attribute that is relevant to OCR in one or more versions of a first text region as extracted from one or more of said plurality of images; and
  
  when the attribute has a value that meets a limit of optical character recognition (OCR) in a version of the first text region, provide the version of the first text region as input to OCR.
- View Dependent Claims (23, 24, 25, 26, 27)
- - 23. The mobile device of claim 22 wherein:
    - the attribute comprises height of each region in the one or more text regions.
  - 24. The mobile device of claim 22 wherein:
    - the second instructions comprise instructions to check for presence of a line of pixels of a common binary value in the one or more text regions.
  - 25. The mobile device of claim 22 wherein the at least one processor is further configured to:
    - check a variance in width of a stroke of a character in the one or more text regions.
  - 26. The mobile device of claim 22 wherein:
    - the plurality of images are captured in a sequence successively one after another.
  - 27. The mobile device of claim 22 wherein:
    - a feature in the scene of real world not captured in an image comprising an enlarged version of the first text region is captured in another image comprising a smaller version of the first text region.

28. A mobile device comprising:
- a camera configured to capture a plurality of images of a scene of real world at a plurality of zoom levels, said scene of real world containing text of one or more sizes;
  
  a memory coupled to the camera for storing the plurality of images;
  
  means, coupled to the memory, for extracting from each of the plurality of images, one or more text regions;
  
  means for analyzing an attribute that is relevant to OCR in one or more versions of a first text region as extracted from one or more of said plurality of images; and
  
  responsive to the attribute having a value that meets a limit of optical character recognition (OCR) in a version of the first text region, means for providing the version of the first text region as input to OCR.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Baheti, Pawan Kumar, Gore, Dhananjay Ashok, Bisain, Abhijeet S., Soundararajan, Rajiv

Granted Patent

US 9,317,764 B2
Time in Patent Office

Days
Field of Search
US Class Current

348/240.99
CPC Class Codes

G06V 20/63   Scene text, e.g. street names

G06V 30/10   Character recognition

G06V 30/1456   based on user interactions

Text Image Quality Based Feedback For Improving OCR

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Text Image Quality Based Feedback For Improving OCR

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links