System and method for detecting text in real-world color images

US 7,817,855 B2
Filed: 09/05/2006
Issued: 10/19/2010
Est. Priority Date: 09/02/2005
Status: Expired due to Fees

First Claim

Patent Images

1. A method of detecting text in real-world images comprising:

dividing an image representing a real-world scene into one or more regions;

calculating a cascade of classifiers, the cascade comprising a plurality of stages, each stage including one or more weak classifiers, the plurality of stages organized to start out with classifiers that are most useful for ruling out non-text regions of the image;

feeding the one or more regions into the cascade; and

removing regions of the image classified as the non-text regions from the cascade prior to completion of the cascade to avoid subsequent processing of the removed regions;

utilizing a binarization process including classifying individual pixels as one of;

non-text, light potential-text, and dark potential-text based on one or more factors including;

a number of pixels in the connected component;

a number of pixels on the border of the connected component;

a height of the connected component;

a width of the connected component;

a ratio of the height of the connected component to the width of the connected component;

a ratio of the pixels in the connected component to the width of the connected component multiplied by the height of the connected component;

a local size of text in the connected component;

outputting binarization output data.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for detecting text in real-world images comprises calculating a cascade of classifiers, the cascade comprising a plurality of stages, each stage including one or more weak classifiers, the plurality of stages organized to start out with classifiers that are most useful for ruling out non-text regions, and removing regions classified as non-text regions from the cascade prior to completion of the cascade, to further speed up processing.

45 Citations

View as Search Results

20 Claims

1. A method of detecting text in real-world images comprising:
- dividing an image representing a real-world scene into one or more regions;
  
  calculating a cascade of classifiers, the cascade comprising a plurality of stages, each stage including one or more weak classifiers, the plurality of stages organized to start out with classifiers that are most useful for ruling out non-text regions of the image;
  
  feeding the one or more regions into the cascade; and
  
  removing regions of the image classified as the non-text regions from the cascade prior to completion of the cascade to avoid subsequent processing of the removed regions;
  
  utilizing a binarization process including classifying individual pixels as one of;
  
  non-text, light potential-text, and dark potential-text based on one or more factors including;
  
  a number of pixels in the connected component;
  
  a number of pixels on the border of the connected component;
  
  a height of the connected component;
  
  a width of the connected component;
  
  a ratio of the height of the connected component to the width of the connected component;
  
  a ratio of the pixels in the connected component to the width of the connected component multiplied by the height of the connected component;
  
  a local size of text in the connected component;
  
  outputting binarization output data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the cascade comprises seven AdaBoost layers.
  - 3. The method of claim 2, wherein each layer of the cascade has an equal or greater number of classifiers than each previous layer of the cascade.
  - 4. The method of claim 2, wherein the classifiers in layers are secondarily ordered based on speed of computation.
  - 5. The method of claim 1, further comprising:
    - outputting an output data comprising identified text regions separated from non-text regions.
  - 6. The method of claim 1, further comprising utilizing two neighborhood thresholds:
    - TLight=μ
      
      +kσ and
      
      TDark=μ
      
      −
      
      kσ
      
      where and μ and
      
      σ
      
      are the mean and variance within the selected neighborhood respectively, and k is a constant.
  - 7. The method of claim 1, further comprising:
    - grouping the pixels into connected components based on their classification and proximity to other pixels.
  - 8. The method of claim 1, further comprising grouping the connected components into lines of text based on a color distance between colors of two connected components.
  - 9. The method of claim 1, further comprising removing regions classified as text regions from the cascade prior to completion of the cascade when a confidence level exceeds a threshold, wherein the confidence level indicates the likelihood of a region being a text region.

10. A system for detecting text in real-world images comprising:
- a processor including;
  
  a dividing logic to divide an image into one or more regions;
  
  a calculating logic to calculate a cascade of classifiers, the cascade comprising a plurality of stages, each stage including one or more weak classifiers, wherein the plurality of stages is organized to start out with classifiers that are most useful for ruling out non-text regions;
  
  a feeding logic to feed the one or more regions into the cascade remove non-text image regions logic to remove image regions classified as the non-text regions from the cascade prior to completion of the cascade, to avoid subsequent processing of the removed regions;
  
  binarization logic including logic to classify individual pixels as one of;
  
  non-text, light potential-text, and dark potential-text; and
  
  a training system including;
  
  a feed logic to feed training images into the cascade;
  
  a comparison logic to compare classifier results to known training image results; and
  
  an adapting logic to adapt one or more of an order of stages in cascade of classifiers, an order of classifiers in the stages, one or more classifiers confidence level thresholds, and the classifiers by selecting features for each classifier that reduce the number of false positive and false negative detections by a reduced number of tests.
- View Dependent Claims (11, 12)
- - 11. The system of claim 10, further comprising:
    - an outputting logic to output an output data comprising identified text regions separated from non-text regions.
  - 12. The system of claim 10, further comprising binarization logic including:
    - logic to classify individual pixels as one of;
      
      non-text, light potential-text, and dark potential-text.

13. A non-transitory computer readable medium storing instructions thereon which, when executed by a system, cause the system to perform a method comprising:
- dividing an image into one or more regions;
  
  calculating a cascade of classifiers, the cascade comprising a plurality of stages, each stage including one or more weak classifiers, wherein the plurality of stages is organized to start out with classifiers that are most useful for ruling out non-text regions of the mage;
  
  receiving training images;
  
  feeding the training images into the cascade;
  
  comparing classifier results to known training image results;
  
  adapting one or more of an order of stages in the cascade, an order of classifiers in the stages, one or more classifier confidence level thresholds, and the classifiers by selecting features for each classifier that reduce a number of false positive and false negative detections by a reduced number of tests;
  
  feeding the one or more regions into the cascade; and
  
  removing regions of the image classified as the non-text regions from the cascade prior to completion of the cascade to avoid subsequent processing of the removed regions.
- View Dependent Claims (14, 15)
- - 14. The computer readable medium of claim 13, further comprising:
    - outputting an output data comprising identified text regions separated from non-text regions.
  - 15. The computer readable medium of claim 13, further comprising utilizing a binarization process including:
    - classifying individual pixels as one of;
      
      non-text, light potential-text, and dark potential-text; and
      
      outputting binarization output data.

16. A method of detecting text in real-world images comprising:
- dividing an image representing a real-world scene into one or more regions;
  
  calculating a cascade of classifiers, the cascade comprising a plurality of stages, each stage including one or more weak classifiers, the plurality of stages organized to start out with classifiers that are most useful for ruling out non-text regions of the image;
  
  receiving training images;
  
  feeding the training images into the cascade;
  
  comparing classifier results to known training image results; and
  
  adapting one or more of an order of stages in the cascade, an order of classifiers in the stages, one or more classifier confidence level thresholds, and the classifiers by selecting features for each classifier that reduce a number of false positive and false negative detections by a reduced number of tests;
  
  feeding the one or more regions into the cascade; and
  
  removing regions of the image classified as the non-text regions from the cascade prior to completion of the cascade to avoid subsequent processing of the removed regions; and
  
  displaying a result.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The method of claim 16, further comprising:
    - utilizing a binarization process including classifying individual pixels as one of;
      
      non-text, light potential-text, and dark potential-text based on one or more factors including;
      
      a number of pixels in the connected component;
      
      a number of pixels on the border of the connected component;
      
      a height of the connected component;
      
      a width of the connected component;
      
      a ratio of the height of the connected component to the width of the connected component;
      
      a ratio of the pixels in the connected component to the width of the connected component multiplied by the height of the connected component;
      
      a local size of text in the connected component.
  - 18. The method of claim 16, further comprising utilizing two neighborhood thresholds:
    - TLight=μ
      
      +kσ and
      
      TDark=μ
      
      −
      
      kσ
      
      where and μ and
      
      σ
      
      are the mean and variance within the selected neighborhood respectively, and k is a constant.
  - 19. The method of claim 16, further comprising grouping the connected components into lines of text based on a color distance between colors of two connected components.
  - 20. The method of claim 16, further comprising removing regions classified as text regions from the cascade prior to completion of the cascade when a confidence level exceeds a threshold, wherein the confidence level indicates the likelihood of a region being a text region.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
The Blindsight Corporation
Original Assignee
The Blindsight Corporation
Inventors
Lagerstrom, Stellan, Yuille, Alan, Terry, Daniel, Chen, Xiangrong, Nitzberg, Mark
Primary Examiner(s)
Dang; Duy M

Application Number

US11/516,147
Publication Number

US 20070110322A1
Time in Patent Office

1,505 Days
Field of Search

382/176, 382/199, 382/200, 382203-204, 382/181, 382/284, 382/321, 358/461, 358/462, 358/464, 707/3, 707/6, 706/12
US Class Current

382/176
CPC Class Codes

G06V 20/62   Text, e.g. of license plate...

G06V 20/63   Scene text, e.g. street names

G06V 30/10   Character recognition

System and method for detecting text in real-world color images

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

45 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for detecting text in real-world color images

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

45 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links