Method of Perspective Correction For Devanagari Text

US 20140161365A1
Filed: 03/15/2013
Published: 06/12/2014
Est. Priority Date: 12/12/2012
Status: Active Grant

First Claim

Patent Images

1. A method to improve automatic recognition of text, the method comprising:

receiving a plurality of regions of text in an image of a scene of real world captured by a camera;

wherein a plurality of pixels of a common binary value, in a word in a region in said plurality of regions of text, are arranged along a first line oriented in a predetermined direction;

wherein a first height at a first end of said word along said predetermined direction is different from a second height at a second end of said word along said predetermined direction;

detecting a plurality of second lines that satisfy at least a predetermined test and pass through a portion of the word having a predetermined relationship to said first line;

determining an angle θ

based on a plurality of angles of the plurality of second lines relative to a common direction;

using the angle θ

to change first coordinates of at least said plurality of pixels in said word, whereby the first height and the second height remain unchanged after the using; and

storing in a memory, at least changed first coordinates generated by the using;

wherein the receiving, the processing, the determining, the using and the storing are performed by one or more processors.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An electronic device and method identify regions that are likely to be text in a natural image or video frame, followed by processing as follows: lines that are nearly vertical are automatically identified in a selected text region, oriented relative to the vertical axis within a predetermined range −max_theta to +max_theta, followed by determination of an angle θ of the identified lines, followed by use of the angle θ to perform perspective correction by warping the selected text region. After perspective correction in this manner, each text region is processed further, to recognize text therein, by performing OCR on each block among a sequence of blocks obtained by slicing the potential text region. Thereafter, the result of text recognition is used to display to the user, either the recognized text or any other information obtained by use of the recognized text.

Citations

25 Claims

1. A method to improve automatic recognition of text, the method comprising:
- receiving a plurality of regions of text in an image of a scene of real world captured by a camera;
  
  wherein a plurality of pixels of a common binary value, in a word in a region in said plurality of regions of text, are arranged along a first line oriented in a predetermined direction;
  
  wherein a first height at a first end of said word along said predetermined direction is different from a second height at a second end of said word along said predetermined direction;
  
  detecting a plurality of second lines that satisfy at least a predetermined test and pass through a portion of the word having a predetermined relationship to said first line;
  
  determining an angle θ
  
  based on a plurality of angles of the plurality of second lines relative to a common direction;
  
  using the angle θ
  
  to change first coordinates of at least said plurality of pixels in said word, whereby the first height and the second height remain unchanged after the using; and
  
  storing in a memory, at least changed first coordinates generated by the using;
  
  wherein the receiving, the processing, the determining, the using and the storing are performed by one or more processors.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1 further comprising:
    - rotating at least the plurality of regions of text through a common angle φ
      
      , to orient said first line to said predetermined direction;
      
      wherein the rotating is performed prior to the processing, the determining and the using.
  - 3. The method of claim 2 wherein:
    - the word is written in Devanagari;
      
      the plurality of pixels of the common binary value are comprised in a shiro-rekha of the word; and
      
      the portion is a strip located between the shiro-rekha and a lower maatra in said region.
  - 4. The method of claim 3 wherein:
    - the plurality of second lines are detected for having a length through the strip larger than a predetermined fraction of a height of the strip.
  - 5. The method of claim 1 wherein:
    - the common direction used to determine the angle θ
      
      is perpendicular to the predetermined direction of the first line; and
      
      the plurality of second lines are detected for being inclined within a predetermined range −
      
      max_theta to +max_theta relative to the common direction.
  - 6. The method of claim 1 wherein:
    - the predetermined direction is parallel to a longitudinal direction of the word; and
      
      the common direction is perpendicular to the longitudinal direction of the word.
  - 7. The method of claim 1 wherein:
    - the using comprises adding to each first coordinate among multiple first coordinates, a product of a corresponding second coordinate and tan (θ
      
      ).
  - 8. The method of claim 1 wherein:
    - multiple second coordinates of multiple pixels in said word, are stored unchanged with multiple changed first coordinates.
  - 9. The method of claim 1 wherein the word is hereinafter a first word and the angle θ
    - is hereinafter a first angle θ
      
      I and the method further comprises;
      
      performing the processing and the determining on a second region in the image, to obtain a second angle θ
      
      J; and
      
      performing the changing by use of the second angle θ
      
      J on first coordinates of pixels in the second region of text.
  - 10. The method of claim 1 further comprising, after the using:
    - dilating the region by adding a set of additional pixels to obtain a dilated region; and
      
      eroding the dilated region by removing a subset in the set of additional pixels added by the dilating.

11. A non-transitory computer-readable storage medium comprising a plurality of instructions to at least one processor to improve automatic recognition of text, the plurality of instructions comprising:
- first instructions to receive a plurality of regions of text in an image of a scene of real world captured by a camera;
  
  wherein a plurality of pixels of a common binary value, in a word in a region in said plurality of regions of text, are arranged along a first line oriented in a predetermined direction;
  
  wherein a first height at a first end of said word along said predetermined direction is different from a second height at a second end of said word along said predetermined direction;
  
  second instructions to detect a plurality of second lines that satisfy at least a predetermined test and pass through a portion of the word having a predetermined relationship to said first line;
  
  third instructions to determine an angle θ
  
  based on a plurality of angles of the plurality of second lines relative to a common direction;
  
  fourth instructions to use the angle θ
  
  to change first coordinates of at least said plurality of pixels in said word, whereby the first height and the second height remain unchanged after execution of the fourth instructions; and
  
  fifth instructions to store in a memory, at least changed first coordinates generated by execution of the fourth instructions.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The non-transitory computer-readable storage medium of claim 11 further comprising:
    - instructions to rotate at least the plurality of regions of text through a common angle φ
      
      , to orient said first line to said predetermined direction;
      
      wherein the instructions to rotate are executed prior to execution of the second instructions, the third instructions and the fourth instructions.
  - 13. The non-transitory computer-readable storage medium of claim 12 wherein:
    - the word is written in Devanagari;
      
      the plurality of pixels of the common binary value are comprised in a shiro-rekha of the word; and
      
      the portion is a strip located between the shiro-rekha and a lower maatra in said region.
  - 14. The non-transitory computer-readable storage medium of claim 13 wherein:
    - the plurality of second lines are detected during the processing, for having a length through the strip larger than a predetermined fraction of a height of the strip.
  - 15. The non-transitory computer-readable storage medium of claim 11 wherein:
    - the common direction used in the third instructions is perpendicular to the predetermined direction of the first line; and
      
      the plurality of second lines are detected during the processing, for being inclined within a predetermined range −
      
      max_theta to +max_theta relative to the common direction.
  - 16. The non-transitory computer-readable storage medium of claim 11 wherein:
    - the predetermined direction is parallel to a longitudinal direction of the word; and
      
      the common direction is perpendicular to the longitudinal direction of the word.
  - 17. The non-transitory computer-readable storage medium of claim 11 wherein:
    - the fourth instructions comprise instructions to add to each first coordinate among multiple first coordinates, a product of a corresponding second coordinate and tan (θ
      
      ).
  - 18. The non-transitory computer-readable storage medium of claim 11 wherein:
    - multiple second coordinates of multiple pixels in said word, are stored unchanged with multiple changed first coordinates.
  - 19. The non-transitory computer-readable storage medium of claim 11 further comprising, configured to be executed after the fifth instructions:
    - sixth instructions to dilate the region by adding a set of additional pixels to obtain a dilated region; and
      
      seventh instructions to erode the dilated region by removing a subset in the set of additional pixels added by the dilating.

20. A mobile device comprising:
- a camera;
  
  a memory operatively connected to the camera to receive at least an image therefrom;
  
  at least one processor operatively connected to the memory to execute a plurality of instructions stored in the memory;
  
  wherein the plurality of instructions cause the at least one processor to;
  
  receive a plurality of regions of text in the image of a scene of real world captured by the camera;
  
  wherein a plurality of pixels of a common binary value, in a word in a region in said plurality of regions of text, are arranged along a first line oriented in a predetermined direction;
  
  wherein a first height at a first end of said word along said predetermined direction is different from a second height at a second end of said word along said predetermined direction;
  
  detect a plurality of second lines that satisfy at least a predetermined test and pass through a portion of the word having a predetermined relationship to said first line;
  
  determine an angle θ
  
  based on a plurality of angles of the plurality of second lines relative to a common direction;
  
  use the angle θ
  
  to change first coordinates of at least said plurality of pixels in said word, whereby the first height and the second height remain unchanged after the using; and
  
  store in the memory, at least changed first coordinates generated by the use.
- View Dependent Claims (21, 22, 23, 24)
- - 21. The mobile device of claim 20 wherein the plurality of instructions further cause the at least one processor to:
    - rotate at least the plurality of regions of text through a common angle φ
      
      , to orient said first line to said predetermined direction.
  - 22. The mobile device of claim 20 wherein:
    - the word is written in Devanagari;
      
      the plurality of pixels of the common binary value are comprised in a shiro-rekha of the word; and
      
      the portion is a strip located between the shiro-rekha and a lower maatra in said region.
  - 23. The mobile device of claim 22 wherein:
    - the plurality of second lines are detected for having a length through the strip larger than a predetermined fraction of a height of the strip.
  - 24. The mobile device of claim 20 wherein:
    - the predetermined direction is parallel to a longitudinal direction of the word; and
      
      the common direction is perpendicular to the longitudinal direction of the word.

25. An apparatus to improve automatic recognition of text, the apparatus comprising:
- means for receiving a plurality of regions of text in an image of a scene of real world captured by a camera;
  
  wherein a plurality of pixels of a common binary value, in a word in a region in said plurality of regions of text, are arranged along a first line oriented in a predetermined direction;
  
  wherein a first height at a first end of said word along said predetermined direction is different from a second height at a second end of said word along said predetermined direction;
  
  means for detecting a plurality of second lines that satisfy at least a predetermined test and pass through a portion of the word having a predetermined relationship to said first line;
  
  means for determining an angle θ
  
  based on a plurality of angles of the plurality of second lines relative to a common direction;
  
  means for using the angle θ
  
  to change first coordinates of at least said plurality of pixels in said word, whereby the first height and the second height remain unchanged after the using; and
  
  means for storing in a memory, at least changed first coordinates generated by the means for using.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Acharya, Hemanth P., Baheti, Pawan Kumar

Granted Patent

US 9,171,204 B2
Time in Patent Office

Days
Field of Search
US Class Current

382/229
CPC Class Codes

G06V 30/10   Character recognition

G06V 30/1478   of characters or characters...

G06V 30/1607   Correcting image deformatio...

G06V 30/416   Extracting the logical stru...

Method of Perspective Correction For Devanagari Text

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Method of Perspective Correction For Devanagari Text

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links