Finding text in natural scenes

US 8,837,830 B2
Filed: 06/12/2012
Issued: 09/16/2014
Est. Priority Date: 06/12/2012
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for automatically detecting text in electronic images of natural scenes, comprising:

receiving an electronic image for analysis;

performing an edge-detection algorithm on the electronic image;

identifying closed contours in the electronic image as a function of detected edges;

establishing links between closed components;

identifying candidate text lines as a function of the identified closed contours;

classifying candidate text lines as being text regions or non-text regions; and

outputting, via a graphical user interface (GUI), the text regions in the electronic image to a user;

wherein identifying candidate text lines further comprises;

selecting a link for consideration;

fitting a line that connects respective centers of first and second closed contours connected by the link;

for each of the first and second closed contours, identifying all associated links other than the selected link, wherein a third closed contour attached to one of the associated links is selected;

re-fitting the fitted line by including newly added third closed contour, wherein the refitted line connects the centers of the first, second, and third closed contours; and

iterating the preceding steps until all closed contours having a center with a distance less than the predetermined threshold T_fhave been added to the candidate text line.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

As set forth herein, systems and methods facilitate providing an efficient edge-detection and closed-contour based approach for finding text in natural scenes such as photographic images, digital, and/or electronic images, and the like. Edge information (e.g., edges of structures or objects in the images) is obtained via an edge detection technique. Edges from text characters form closed contours even in the presence of reasonable levels of noise. Closed contour linking and candidate text line formation are two additional features of the described approach. A candidate text line classifier is applied to further screen out false-positive text identifications. Candidate text regions for placement of text in the natural scene of the electronic image are highlighted and presented to a user.

Citations

19 Claims

1. A computer-implemented method for automatically detecting text in electronic images of natural scenes, comprising:
- receiving an electronic image for analysis;
  
  performing an edge-detection algorithm on the electronic image;
  
  identifying closed contours in the electronic image as a function of detected edges;
  
  establishing links between closed components;
  
  identifying candidate text lines as a function of the identified closed contours;
  
  classifying candidate text lines as being text regions or non-text regions; and
  
  outputting, via a graphical user interface (GUI), the text regions in the electronic image to a user;
  
  wherein identifying candidate text lines further comprises;
  
  selecting a link for consideration;
  
  fitting a line that connects respective centers of first and second closed contours connected by the link;
  
  for each of the first and second closed contours, identifying all associated links other than the selected link, wherein a third closed contour attached to one of the associated links is selected;
  
  re-fitting the fitted line by including newly added third closed contour, wherein the refitted line connects the centers of the first, second, and third closed contours; and
  
  iterating the preceding steps until all closed contours having a center with a distance less than the predetermined threshold T_fhave been added to the candidate text line.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method according to claim 1, wherein identifying closed contours in the electronic image further comprises:
    - identifying open tips of potential closed contours;
      
      connecting any two open tips that are separated by a distance smaller than a threshold T_opento form an edge;
      
      eroding any remaining open tips until no open tips remain in the potential closed contours;
      
      outputting one or more closed contours.
  - 3. The method according to claim 2, further comprising:
    - detecting one or more falsely-connected closed contours; and
      
      disconnecting the one or more falsely connected closed contours.
  - 4. The method according to claim 3, wherein disconnecting the one or more falsely connected closed contours further comprises:
    - executing a connected component algorithm to differentiate between edge pixels that border a closed component and edge pixels that separate two background pixel regions; and
      
      removing edge pixels that separate two background pixel regions and do not border a closed component.
  - 5. The method according to claim 1, wherein establishing links between closed components further comprises:
    - determining whether a distance between centers of two closed contours is smaller than a first threshold, T_d;
      
      applying a second threshold to a ratio of the heights between the two closed contours; and
      
      applying a constraint on pixel color whereby background pixels of neighboring closed contours are similar to each other and text pixels are similar to each other.
  - 6. The method according to claim 5, further comprising:
    - dilating edge pixels of each closed contour;
      
      estimating a Gaussian mixture distribution with two modes on the chrominance channels in a luminance-chrominance color space of all pixels covered by a dilated contour;
      
      computing an average of a Kullback-Leibler divergence between background modes and between text modes for the two closed contours such that;
      
      D_color(C₁,C₂)=½
      
      (KL(G_1,1,G_2,1)+KL(G_1,2,G_2,2)),where C₁and C₂represent any two closed contours, while G_1,1, G_1,2and G_2,1, G_2,2are the background and text modes estimated for the two closed contours, respectively; and
      
      retaining a linkage between the two closed contours if the distance D_coloris within a threshold T_c.
  - 7. The method according to claim 1, wherein the second closed contour has a center with a smallest distance to the fitted line relative to other closed contours, wherein if the distance to the fitted straight line is also smaller than a predetermined threshold T_f, the second closed contour is added after the first closed contour to form a sequence of closed contours.
  - 8. The method according to claim 7, further comprising:
    - after the two closed contours from the initially selected link have been extended, evaluating a total number of closed contours in the candidate text line; and
      
      verifying that the candidate text line is a candidate text line if at least a predetermined number of closed contours are present in candidate text line.
  - 9. The method according to claim 8, wherein the verified text regions in the electronic image are output to a user via a graphical user interface (GUI), verified text regions in the electronic image.
  - 10. The method according to claim 1, wherein classifying candidate text lines as being text or non-text regions further comprises:
    - calculating median, upper, and lower quartiles of sorted aspect ratios for all closed contours in the candidate text line;
      
      calculating median, upper, and lower quartiles of text-to-background pixel ratios for all closed contours in the candidate text line;
      
      calculating a Kullback-Leibler divergence between a foreground pixel Gaussian distribution for text pixels in each candidate text line and a background pixel Gaussian distribution for background pixels in each candidate text line;
      
      classifying the candidate text lines as text regions or non-text regions as a function the Kullback-Leibler divergence; and
      
      outputting text regions in the image to a user.

11. A computerized system that facilitates automatically detecting text in electronic images of natural scenes, comprising:
- a memory that stores computer-executable instructions; and
  
  a processor configured to execute the instructions, the instructions comprising;
  
  receiving an electronic image for analysis;
  
  performing an edge-detection algorithm on the electronic image;
  
  identifying closed contours in the electronic image as a function of detected edges;
  
  establishing links between closed components;
  
  identifying candidate text lines as a function of the identified closed contours;
  
  classifying candidate text lines as being text regions or non-text regions; and
  
  a graphical user interface (GUI) via which the text regions in the electronic image are displayed to a user;
  
  wherein the instructions for identifying candidate text lines further comprise instructions for;
  
  selecting a link for consideration;
  
  fitting a line that connects respective centers of first and second closed contours connected by the link;
  
  for each of the first and second closed contours, identifying all associated links other than the selected link, wherein a third closed contour attached to one of the associated links is selected;
  
  re-fitting the fitted line by including newly added third closed contour, wherein the refitted line connects the centers of the first, second, and third closed contours; and
  
  iterating the preceding steps until all closed contours having a center with a distance less than the predetermined threshold T_fhave been added to the candidate text line.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
- - 12. The system according to claim 11, wherein the instructions for identifying closed contours in the electronic image further comprise instructions for:
    - identifying open tips of potential closed contours;
      
      connecting any two open tips that are separated by a distance smaller than a threshold T_opento form an edge;
      
      eroding any remaining open tips until no open tips remain in the potential closed contours;
      
      outputting one or more closed contours.
  - 13. The system according to claim 12, wherein the memory stores and the processor is configured to execute computer-executable instructions for:
    - detecting one or more falsely-connected closed contours; and
      
      disconnecting the one or more falsely connected closed contours.
  - 14. The system according to claim 13, wherein the instructions for disconnecting the one or more falsely connected closed contours further comprise instructions for:
    - executing a connected component algorithm to differentiate between edge pixels that border a closed component and edge pixels that separate two background pixel regions; and
      
      removing edge pixels that separate two background pixel regions and do not border a closed component.
  - 15. The system according to claim 11, wherein the instructions for establishing links between closed components further comprises:
    - determining whether a distance between centers of two closed contours is smaller than a first threshold, T_d;
      
      applying a second threshold to a ratio of the heights between the two closed contours; and
      
      applying a constraint on pixel color whereby background pixels of neighboring closed contours are similar to each other and text pixels are similar to each other.
  - 16. The system according to claim 15, wherein the memory stores and the processor is configured to execute computer-executable instructions for:
    - dilating edge pixels of each closed contour;
      
      estimating a Gaussian mixture distribution with two modes on the chrominance channels in a luminance-chrominance color space of all pixels covered by a dilated contour;
      
      computing an average of a Kullback-Leibler divergence between background modes and between text modes for the two closed contours such that;
      
      D_color(C₁,C₂)=½
      
      (KL(G_1,1,G_2,1)+KL(G_1,2,G_2,2)),where C₁and C₂represent any two closed contours, while G_1,1, G_1,2and G_2,1, G_2,2are the background and text modes estimated for the two closed contours, respectively; and
      
      retaining a linkage between the two closed contours if the distance D_coloris within a threshold T_c.
  - 17. The system according to claim 11, wherein the second closed contour has a center with a smallest distance to the fitted line relative to other closed contours, wherein if the distance to the fitted straight line is also smaller than a predetermined threshold T_f, the second closed contour is added after the first closed contour to form a sequence of closed contours.
  - 18. The system according to claim 17, wherein the memory stores and the processor is configured to execute computer-executable instructions for:
    - after the two closed contours from the initially selected link have been extended, evaluating a total number of closed contours in the candidate text line; and
      
      verifying that the candidate text line is a candidate text line if at least a predetermined number of closed contours are present in candidate text line.
  - 19. The system according to claim 11, wherein the instructions for classifying candidate text lines as being text or non-text regions further comprise instructions for:
    - calculating median, upper, and lower quartiles of sorted aspect ratios for all closed contours in the candidate text line;
      
      calculating median, upper, and lower quartiles of text-to-background pixel ratios for all closed contours in the candidate text line;
      
      calculating a Kullback-Leibler divergence between a foreground pixel Gaussian distribution for text pixels in each candidate text line and a background pixel Gaussian distribution for background pixels in each candidate text line;
      
      classifying the candidate text lines as text regions or non-text regions as a function the Kullback-Leibler divergence; and
      
      outputting text regions in the image to a user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Conduent Business Services, LLC (Conduent, Inc.)
Original Assignee
Xerox Corporation (Xerox Holdings Corp.)
Inventors
Bala, Raja, Fan, Zhigang, Ding, Hengzhou, Allebach, Jan P., Bouman, Charles A.
Primary Examiner(s)
Hung, Yubin

Application Number

US13/494,173
Publication Number

US 20130330004A1
Time in Patent Office

826 Days
Field of Search

382/164, 382/176, 382/180, 358/462
US Class Current

382/176
CPC Class Codes

G06T 2207/10004   Still image; Photographic i...

G06T 7/13   Edge detection

G06T 7/181   involving edge growing; inv...

G06V 10/44   Local feature extraction by...

G06V 20/63   Scene text, e.g. street names

G06V 30/10   Character recognition

G06V 30/18076   by analysing connectivity, ...

Finding text in natural scenes

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Finding text in natural scenes

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links