GENERALIZED TEXT LOCALIZATION IN IMAGES

US 20020159636A1
Filed: 03/14/2000
Published: 10/31/2002
Est. Priority Date: 03/14/2000
Status: Active Grant

First Claim

Patent Images

1. A method of locating text in digital images, comprising:

scaling a digital image into images of multiple resolutions;

classifying whether pixels in the multiple resolutions are part of a text region;

integrating scales to create a scale integration saliency map;

using the saliency map to create initial text bounding boxes through expanding the boxes from rectangles of pixels including at least one pixel to include groups of at least one pixel adjacent to the rectangles, wherein the groups have a particular relationship to a first threshold; and

consolidating the initial text bounding boxes.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In some embodiments, the invention includes a method for locating text in digital images. The method includes scaling a digital image into images of multiple resolutions and classifying whether pixels in the multiple resolutions are part of a text region. The method also includes integrating scales to create a scale integration saliency map and using the saliency map to create initial text bounding boxes through expanding the boxes from rectangles of pixels including at least one pixel to include groups of at least one pixel adjacent to the rectangles, wherein the groups have a particular relationship to a first threshold. The initial text bounding boxes are consolidated. In other embodiments, a method includes classifying whether pixels are part of a text region, creating initial text bounding boxes, and consolidating the initial text bounding boxes, wherein the consolidating includes creating horizontal projection profiles having adaptive thresholds and vertical projection profiles having adaptive thresholds.

72 Citations

32 Claims

1. A method of locating text in digital images, comprising:
- scaling a digital image into images of multiple resolutions;
  
  classifying whether pixels in the multiple resolutions are part of a text region;
  
  integrating scales to create a scale integration saliency map;
  
  using the saliency map to create initial text bounding boxes through expanding the boxes from rectangles of pixels including at least one pixel to include groups of at least one pixel adjacent to the rectangles, wherein the groups have a particular relationship to a first threshold; and
  
  consolidating the initial text bounding boxes.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, 29, 30, 31, 32)
- - 2. The method of claim 1, wherein the particular relationship is that an average intensity of the group exceeds the first threshold.
  - 3. The method of claim 1, wherein the groups include a row or column adjacent to the rectangle and the rectangle starts as a 1 pixel by 1 pixel rectangle.
  - 4. The method of claim 1, wherein the saliency map is of the same resolution as the digital image before scaling to multiple resolutions.
  - 5. The method of claim 1, wherein the digital image is part of a digital video image and consolidating of the initial text bounding boxes includes creating horizontal projection profiles having adaptive thresholds and vertical projection profiles having adaptive thresholds.
  - 6. The method of claim 5, wherein the adaptive thresholds for the horizontal projection profiles are functions of minimum and maximum values of the horizontal projection profiles and the adaptive thresholds for the vertical projection profiles are functions of minimum and maximum values of the vertical projection profiles.
  - 7. The method of claim 1, wherein consolidating the initial text bounding boxes includes repeatedly performing a horizontal segmentation algorithm and a vertical segmentation algorithm.
  - 8. The method of claim 6 , wherein the horizontal segmentation algorithm includes expanding a text bounding box at the top and bottom by a minimum of half the height of the original text box and half the possible maximal text height.
  - 9. The method of claim 1, further comprising calculating edge orientation to identify image features in the multiple resolutions.
  - 10. The method of claim 1, further comprising using a signature based tracking to identify frames including text in a text object in a forward and backward direction from a frame in which the text has been identified through an image based method.
  - 11. The method of claim 1, further comprising estimating color of text in the image through creating color histograms in the text and non-text portions surrounding the text.
  - 13. The apparatus of claim 12, wherein the particular relationship is that an average intensity of the group exceeds the first threshold.
  - 14. The apparatus of claim 12, wherein the groups include a row or column adjacent to the rectangle and the rectangle starts as a 1 pixel by 1 pixel rectangle.
  - 15. The apparatus of claim 12, wherein the saliency map is of the same resolution as the digital image before scaling to multiple resolutions.
  - 16. The apparatus of claim 12, wherein the digital image is part of a digital video image and consolidating of the initial text bounding boxes includes creating horizontal projection profiles having adaptive thresholds and vertical projection profiles having adaptive thresholds.
  - 17. The apparatus of claim 16, wherein the adaptive thresholds for the horizontal projection profiles are functions of minimum and maximum values of the horizontal projection profiles and the adaptive thresholds for the vertical projection profiles are functions of minimum and maximum values of the vertical projection profiles.
  - 18. The apparatus of claim 12, wherein consolidating the initial text bounding boxes includes repeatedly performing a horizontal segmentation algorithm and a vertical segmentation algorithm.
  - 19. The apparatus of claim 18, wherein the horizontal segmentation algorithm includes expanding a text bounding box at the top and bottom by a minimum of half the height of the original text box and half the possible maximal text height.
  - 20. The apparatus of claim 12, further comprising calculating edge orientation to identify image features in the multiple resolutions.
  - 21. The apparatus of claim 12, further comprising using a signature based tracking to identify frames including text in a text object in a forward and backward direction from a frame in which the text has been identified through an image based method.
  - 22. The apparatus of claim 1, further comprising estimating color of text in the image through creating color histograms in the text and non-text portions surrounding the text.
  - 24. The method of claim 23, wherein the adaptive thresholds for the horizontal projection profiles is a function of minimum and maximum values of the horizontal projection profiles and the adaptive thresholds for the vertical projection profiles are functions of minimum and maximum values of the vertical projection profiles.
  - 25. The method of claim 23, wherein consolidating the initial text bounding boxes includes repeatedly performing a horizontal segmentation algorithm and a vertical segmentation algorithm.
  - 26. The method of claim 23, wherein the horizontal segmentation algorithm includes expanding a text bounding box at the right and left by a minimum of half the height of the original text box and half the possible maximal text height.
  - 27. The method of claim 23, wherein the vertical segmentation algorithm includes expanding a text bounding box at the top and bottom by a minimum of half the height of the original text box and half the possible maximal text height.
  - 29. The apparatus of claim 28, wherein the adaptive thresholds for the horizontal projection profiles is a function of minimum and maximum values of the horizontal projection profiles and the adaptive thresholds for the vertical projection profiles are functions of minimum and maximum values of the vertical projection profiles.
  - 30. The apparatus of claim 28, wherein consolidating the initial text bounding boxes includes repeatedly performing a horizontal segmentation algorithm and a vertical segmentation algorithm.
  - 31. The apparatus of claim 28, wherein the horizontal segmentation algorithm includes expanding a text bounding box at the right and left by a minimum of half the height of the original text box and half the possible maximal text height.
  - 32. The apparatus of claim 28, wherein the vertical segmentation algorithm includes expanding a text bounding box at the top and bottom by a minimum of half the height of the original text box and half the possible maximal text height.

12. An apparatus comprising:
- a machine readable medium having instructions thereon which when executed cause a processor to perform a method including;
  
  scaling a digital image into images of multiple resolutions;
  
  classifying whether pixels in the multiple resolutions are part of a text region;
  
  integrating scales to create a scale integration saliency map;
  
  using the saliency map to create initial text bounding boxes through expanding the boxes from rectangles of pixels including at least one pixel to include groups of at least one pixel adjacent to the rectangles, wherein the groups have a particular relationship to a first threshold; and
  
  consolidating the initial text bounding boxes.

23. A method, comprising:
- classifying whether pixels are part of a text region;
  
  creating initial text bounding boxes; and
  
  consolidating the initial text bounding boxes, wherein the consolidating includes creating horizontal projection profiles having adaptive thresholds and vertical projection profiles having adaptive thresholds.

28. An apparatus comprising:
- a machine readable medium having instructions thereon which when executed cause a processor to perform a method including;
  
  classifying whether pixels are part of a text region;
  
  creating initial text bounding boxes; and
  
  consolidating the initial text bounding boxes, wherein the consolidating includes creating horizontal projection profiles having adaptive thresholds and vertical projection profiles having adaptive thresholds.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Intel Corporation
Original Assignee
Intel Corporation
Inventors
Lienhart, Rainer W, Wernicke, Axel

Granted Patent

US 6,470,094 B1
Time in Patent Office

Days
Field of Search
US Class Current

382/176
CPC Class Codes

G06T 2207/10016   Video; Image sequence

G06T 2207/20008   Globally adaptive

G06T 2207/30176   Document

G06T 7/11   Region-based segmentation

G06T 7/194   involving foreground-backgr...

G06V 20/62   Text, e.g. of license plate...

G06V 30/10   Character recognition

GENERALIZED TEXT LOCALIZATION IN IMAGES

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

72 Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

GENERALIZED TEXT LOCALIZATION IN IMAGES

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

72 Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links