Method and apparatus for recognizing text in an image sequence of scene imagery

US 7,620,268 B2
Filed: 01/03/2008
Issued: 11/17/2009
Est. Priority Date: 09/22/2000
Status: Expired due to Fees

First Claim

Patent Images

1. Method for recognizing text in a captured imagery having a plurality of frames, said method using a processor to perform steps comprising of:

(a) detecting a text region in a first frame of the plurality of frames;

(b) applying, using the processor, optical character recognition processing (OCR) to said detected text region to identify potential text for said first frame; and

(c) agglomerating the potential text with potential text for at least a second frame of the plurality of frames, in a manner that takes an OCR result from each of the first frame and the at least said second frame, to produce a single recognition result for text in the detected text region.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An apparatus and a concomitant method for detecting and recognizing text information in a captured imagery. The present method transforms the image of the text to a normalized coordinate system before performing OCR, thereby yielding more robust recognition performance. The present invention also combines OCR results from multiple frames, in a manner that takes the best recognition results from each frame and forms a single result that can be more accurate than the results from any of the individual frames.

Citations

28 Claims

1. Method for recognizing text in a captured imagery having a plurality of frames, said method using a processor to perform steps comprising of:
- (a) detecting a text region in a first frame of the plurality of frames;
  
  (b) applying, using the processor, optical character recognition processing (OCR) to said detected text region to identify potential text for said first frame; and
  
  (c) agglomerating the potential text with potential text for at least a second frame of the plurality of frames, in a manner that takes an OCR result from each of the first frame and the at least said second frame, to produce a single recognition result for text in the detected text region.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, wherein said agglomerating step (c) comprises a step of updating an agglomeration structure with said potential text of a current frame of the plurality of frames.
  - 3. The method of claim 2, wherein said updating step comprises a step of (c1) finding correspondence between a text region of said agglomeration structure and a text region of said current frame.
  - 4. The method of claim 3, wherein said updating step further comprises a step of (c2) finding character-to-character correspondence for each pair of overlapping lines between said text region of said agglomeration structure and said text region of said current frame to find one or more character group pairs.
  - 5. The method of claim 4, wherein said updating step further comprises a step of (c3) updating said one or more character group pairs.
  - 6. The method of claim 5, wherein said updating step further comprises a step of (c4) marking text in said agglomeration structure that is not in said current frame as a deletion.
  - 7. The method of claim 6, wherein said updating step further comprises a step of (c5) marking text in said current frame that is not in said agglomeration structure as an insertion.
  - 8. The method of claim 2, further comprising a step of:
    - (d) outputting said text in the detected text region after each frame of the plurality of frames is processed.
  - 9. The method of claim 2, further comprising a step of:
    - (d) outputting said text in the detected text region only when a change is detected as to said text in said captured imagery.
  - 10. The method of claim 2, further comprising a step of:
    - (d) outputting only text within said agglomeration structure when said text within said agglomeration structure is not detected in the current frame.

11. Apparatus for recognizing text in a captured imagery having a plurality of frames, said apparatus comprising:
- means for detecting a text region in a first frame of the plurality of frames;
  
  means for applying optical character recognition processing (OCR) to said detected text region to identify potential text for said first frame; and
  
  means for agglomerating the potential text with potential text for at least a second frame of the plurality of frames, in a manner that takes an OCR result from each of the first frame and the at least said second frame, to produce a single recognition result for text in the detected text region.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The apparatus of claim 11, wherein said agglomerating means updates an agglomeration structure with said potential text of a current frame of the plurality of frames.
  - 13. The apparatus of claim 12, wherein said agglomerating means finds correspondence between a text region of said agglomeration structure and a text region of said current frame.
  - 14. The apparatus of claim 13, wherein said agglomerating means further finds character-to-character correspondence for each pair of overlapping lines between said text region of said agglomeration structure and said text region of said current frame to find one or more character group pairs.
  - 15. The apparatus of claim 14, wherein said agglomerating means further updates said one or more character group pairs.
  - 16. The apparatus of claim 15, wherein said agglomerating means further marks text in said agglomeration structure that is not in said current frame as a deletion.
  - 17. The apparatus of claim 16, wherein said agglomerating means further marks text in said current frame that is not in said agglomeration structure as an insertion.
  - 18. The apparatus of claim 12, further comprising:
    - means for outputting said text in the detected text region after each frame of the plurality of frames is processed.
  - 19. The apparatus of claim 12, further comprising:
    - means for outputting said text in the detected text region only when a change is detected as to said text in said captured imagery.
  - 20. The apparatus of claim 12, further comprising:
    - means for outputting only text within said agglomeration structure when said text within said agglomeration structure is not detected in the current frame.

21. Method for recognizing text in a captured imagery having a plurality of frames, said method using a processor to perform steps comprising of:
- (a) detecting a text region in a frame of the captured imagery;
  
  (b) applying, using the processor, optical character recognition processing (OCR) to said detected text region to identify potential text for said frame; and
  
  (c) agglomerating the OCR identified potential text over a plurality of frames in the captured imagery to recognize the text in the detected text region, wherein said agglomerating step (c) comprises a step of;
  
  updating an agglomeration structure with said OCR identified potential text of a current frame, and wherein said updating step comprises steps of;
  
  (c1) finding correspondence between a text region of said agglomeration structure with a text region of said current frame; and
  
  (c2) finding character-to-character correspondence for each pair of overlapping lines between said text region of said agglomeration structure with said text region of said current frame to find one or more character group pairs.
- View Dependent Claims (22, 23, 24)
- - 22. The method of claim 21, wherein said updating step further comprises the step of (c3) updating said one or more character group pairs.
  - 23. The method of claim 22, wherein said updating step further comprises the step of (c4) marking text in said agglomeration structure that is not in said current frame as a deletion.
  - 24. The method of claim 23, wherein said updating step further comprises the step of (c5) marking text in said current frame that is not in said agglomeration structure as an insertion.

25. Apparatus for recognizing text in a captured imagery having a plurality of frames, said apparatus comprising:
- means for detecting a text region in a frame of the captured imagery;
  
  means for applying optical character recognition processing (OCR) to said detected text region to identify potential text for said frame; and
  
  means for agglomerating the OCR identified potential text over a plurality of frames in the captured imagery to extract the text in the detected text region, wherein said agglomerating means updates an agglomeration structure with said OCR identified potential text of a current frame and finds correspondence between a text region of said agglomeration structure with a text region of said current frame, and wherein said agglomerating means further finds character-to-character correspondence for each pair of overlapping lines between said text region of said agglomeration structure with said text region of said current frame to find one or more character group pairs.
- View Dependent Claims (26, 27, 28)
- - 26. The apparatus of claim 25, wherein said agglomerating means further updates said one or more character group pairs.
  - 27. The apparatus of claim 26, wherein said agglomerating means further marks text in said agglomeration structure that is not in said current frame as a deletion.
  - 28. The apparatus of claim 27, wherein said agglomerating means further marks text in said current frame that is not in said agglomeration structure as an insertion.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
SRI International, Inc.
Original Assignee
SRI International, Inc.
Inventors
Herson, James A., Myers, Gregory K., Bolles, Robert C., Luong, Quang-Tuan
Primary Examiner(s)
COUSO, YON JUNG

Application Number

US11/969,032
Publication Number

US 20080101726A1
Time in Patent Office

684 Days
Field of Search

382/173, 382/174, 382/176, 382/177, 382/202, 382/229, 382/289, 382/290, 382/281, 382/295, 382/296, 382/286, 382/291, 382/292, 348/147, 348/144, 348/143, 348/42, 348/47, 348/48, 348/153, 348/159
US Class Current

382/289
CPC Class Codes

G06V 20/63   Scene text, e.g. street names

G06V 30/10   Character recognition

G06V 30/1478   of characters or characters...

Method and apparatus for recognizing text in an image sequence of scene imagery

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for recognizing text in an image sequence of scene imagery

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links