Method of perspective correction for devanagari text

US 9,171,204 B2
Filed: 03/15/2013
Issued: 10/27/2015
Est. Priority Date: 12/12/2012
Status: Expired due to Fees

First Claim

Patent Images

1. A method to improve automatic recognition of text, the method comprising:

receiving a plurality of regions in an image of a scene of real world captured by a camera;

rotating at least the plurality of regions through a common angle φ

, to obtain a set of skew-corrected regions;

after the rotating, applying to the set of skew-corrected regions one or more tests that determine presence of text, to identify a subset of regions likely to be text;

after the applying, determining a slant angle θ

of at least a portion of a region, by combining a plurality of angles of a plurality of lines relative to a common direction, each line in the plurality of lines representing multiple line segments in the region that are at least one pixel wide, located adjacent to one another, and formed by pixels of text;

using the slant angle θ

to change first coordinates of at least pixels in the portion, whereby a first height at a first end of the portion and a second height at a second end of the portion remain unchanged after the using; and

storing in a memory, at least changed first coordinates generated by the using;

wherein the receiving, the rotating, the applying, the determining, the using and the storing are performed by one or more processors.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An electronic device and method identify regions that are likely to be text in a natural image or video frame, followed by processing as follows: lines that are nearly vertical are automatically identified in a selected text region, oriented relative to the vertical axis within a predetermined range −max_theta to +max_theta, followed by determination of an angle θ of the identified lines, followed by use of the angle θ to perform perspective correction by warping the selected text region. After perspective correction in this manner, each text region is processed further, to recognize text therein, by performing OCR on each block among a sequence of blocks obtained by slicing the potential text region. Thereafter, the result of text recognition is used to display to the user, either the recognized text or any other information obtained by use of the recognized text.

19 Citations

View as Search Results

20 Claims

1. A method to improve automatic recognition of text, the method comprising:
- receiving a plurality of regions in an image of a scene of real world captured by a camera;
  
  rotating at least the plurality of regions through a common angle φ
  
  , to obtain a set of skew-corrected regions;
  
  after the rotating, applying to the set of skew-corrected regions one or more tests that determine presence of text, to identify a subset of regions likely to be text;
  
  after the applying, determining a slant angle θ
  
  of at least a portion of a region, by combining a plurality of angles of a plurality of lines relative to a common direction, each line in the plurality of lines representing multiple line segments in the region that are at least one pixel wide, located adjacent to one another, and formed by pixels of text;
  
  using the slant angle θ
  
  to change first coordinates of at least pixels in the portion, whereby a first height at a first end of the portion and a second height at a second end of the portion remain unchanged after the using; and
  
  storing in a memory, at least changed first coordinates generated by the using;
  
  wherein the receiving, the rotating, the applying, the determining, the using and the storing are performed by one or more processors.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 wherein:
    - the portion is a strip extracted from a location below a y-coordinate of a peak in a histogram, of counts of pixels of a common binary value in each row among a plurality of rows in the region.
  - 3. The method of claim 2 wherein:
    - each line in the plurality of lines is detected for satisfying a test on having a length within the portion larger than a predetermined fraction of a height of the portion.
  - 4. The method of claim 1 wherein:
    - the common direction used to determine the angle θ
      
      is perpendicular to a longitudinal direction of the skew-corrected region.
  - 5. The method of claim 1 wherein a perspective corrected region is obtained by the using, and the method further comprising, after the using:
    - dilating the perspective corrected region by adding a set of additional pixels to obtain a dilated region; and
      
      eroding the dilated region by removing a subset in the set of additional pixels added by the dilating.
  - 6. The method of claim 1 further comprising:
    - clustering regions in the subset, when a test of geometry is satisfied;
      
      wherein the clustering is performed after the applying and before the determining.
  - 7. The method of claim 1 further comprising:
    - classifying regions in the subset as text/non-text, by use of a neural network or stroke width;
      
      wherein the classifying is performed after the applying and before the determining.

8. A non-transitory computer-readable storage medium comprising a plurality of instructions to at least one processor to improve automatic recognition of text, the plurality of instructions comprising:
- first instructions to receive a plurality of regions in an image of a scene of real world captured by a camera;
  
  second instructions to rotate at least the plurality of regions through a common angle φ
  
  , to obtain a set of skew-corrected regions;
  
  to execute after execution of the second instructions to rotate, third instructions to apply to the set of skew-corrected regions one or more tests that determine presence of text, to identify a subset of regions likely to be text;
  
  to execute after execution of the third instructions to apply, fourth instructions to determine a slant angle θ
  
  of at least a portion of a region in the subset, by combining a plurality of angles of a plurality of lines relative to a common direction, each line in the plurality of lines representing multiple line segments in the region that are at least one pixel wide, located adjacent to one another, and formed by pixels of text;
  
  fifth instructions to use the slant angle θ
  
  to change first coordinates of at least pixels in the portion, whereby a first height at a first end of the portion and a second height at a second end of the portion remain unchanged after execution of the fifth instructions; and
  
  sixth instructions to store in a memory, at least changed first coordinates generated by execution of the fifth instructions.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The non-transitory computer-readable storage medium of claim 8 wherein:
    - the portion is a strip extracted from a location below a y-coordinate of a peak in a histogram, of counts of pixels of a common binary value in each row among a plurality of rows in the region.
  - 10. The non-transitory computer-readable storage medium of claim 9 wherein:
    - each line in the plurality of lines is detected for satisfying a test on having a length within the portion larger than a predetermined fraction of a height of the portion.
  - 11. The non-transitory computer-readable storage medium of claim 8 wherein:
    - the common direction used to determine the angle θ
      
      is perpendicular to a longitudinal direction of the skew-corrected region.
  - 12. The non-transitory computer-readable storage medium of claim 8 wherein a perspective corrected region is obtained by execution of the fifth instructions to use, the plurality of instructions further comprising, configured to be executed after the sixth instructions:
    - seventh instructions to dilate the perspective corrected region by adding a set of additional pixels to obtain a dilated region; and
      
      eighth instructions to erode the dilated region by removing a subset in the set of additional pixels added by the seventh instructions to dilate.
  - 13. The non-transitory computer-readable storage medium of claim 8 further comprising:
    - instructions to cluster regions in the subset, when a test of geometry is satisfied;
      
      wherein the instructions to cluster are to execute after execution of the third instructions to apply and before execution of the fourth instructions to determine.
  - 14. The non-transitory computer-readable storage medium of claim 8 further comprising:
    - instructions to classify regions in the subset as text/non-text, by use of a neural network or stroke width;
      
      wherein the instructions to classify are to execute after execution of the third instructions to apply and before execution of the fourth instructions to determine.

15. A mobile device comprising:
- a camera;
  
  a memory operatively connected to the camera to receive at least an image therefrom;
  
  at least one processor operatively connected to the memory to execute a plurality of instructions stored in the memory;
  
  wherein the plurality of instructions cause the at least one processor to;
  
  rotate at least the plurality of regions through a common angle φ
  
  , to obtain a set of skew-corrected regions;
  
  after rotation through the common angle φ
  
  , apply to the set of skew-corrected regions one or more tests that determine presence of text, to identify a subset of regions likely to be text;
  
  after application of the one or more tests, determine a slant angle θ
  
  of at least a portion of a region in the subset, by combining a plurality of angles of a plurality of lines relative to a common direction, each line in the plurality of lines representing multiple line segments in the region that are at least one pixel wide, located adjacent to one another, and formed by pixels of text;
  
  use the slant angle θ
  
  to change first coordinates of at least pixels in the portion, whereby a first height at a first end of the portion and a second height at a second end of the portion remain unchanged after the use; and
  
  store in the memory, at least changed first coordinates generated by the use.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The mobile device of claim 15 wherein:
    - the portion is a strip extracted from a location below a y-coordinate of a peak in a histogram, of counts of pixels of a common binary value in each row among a plurality of rows in the region.
  - 17. The mobile device of claim 16 wherein:
    - each line in the plurality of lines is detected for satisfying a test on having a length within the portion larger than a predetermined fraction of a height of the portion.
  - 18. The mobile device of claim 15 wherein the plurality of instructions further cause the at least one processor to:
    - cluster regions in the subset, when a test of geometry is satisfied;
      
      wherein the instructions to cluster are to execute after application of the one or more tests and before determination of the slant angle θ
      
      .
  - 19. The mobile device of claim 15 wherein the plurality of instructions further cause the at least one processor to:
    - classify regions in the subset as text/non-text, by use of a neural network or stroke width;
      
      wherein the instructions to classify are to execute after application of the one or more tests and before determination of the slant angle θ
      
      .

20. An apparatus to improve automatic recognition of text, the apparatus comprising:
- means for receiving a plurality of regions in an image of a scene of real world captured by a camera;
  
  means for rotating at least the plurality of regions through a common angle φ
  
  , to obtain a set of skew-corrected regions;
  
  means, operable after rotation through the common angle φ
  
  , for applying to the set of skew-corrected regions one or more tests that determine presence of text, to identify a subset of regions likely to be text;
  
  means, operable after application of the one or more tests, for determining a slant angle θ
  
  of at least a portion of a region in the subset, by combining a plurality of angles of a plurality of lines relative to a common direction, each line in the plurality of lines representing multiple line segments in the region that are at least one pixel wide, located adjacent to one another, and formed by pixels of text;
  
  means for using the slant angle θ
  
  to change first coordinates of at least pixels in word the portion, whereby a first height at a first end of the portion and a second height at a second end of the portion remain unchanged after operation of the means for the using; and
  
  means for storing in a memory, at least changed first coordinates generated by the means for using.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Acharya, Hemanth P., Baheti, Pawan Kumar
Primary Examiner(s)
Patel, Nirav G

Application Number

US13/842,985
Publication Number

US 20140161365A1
Time in Patent Office

956 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06V 30/10   Character recognition

G06V 30/1478   of characters or characters...

G06V 30/1607   Correcting image deformatio...

G06V 30/416   Extracting the logical stru...

Method of perspective correction for devanagari text

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

19 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Method of perspective correction for devanagari text

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

19 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others