Multi-frame videotext recognition

US 8,290,273 B2
Filed: 03/27/2009
Issued: 10/16/2012
Est. Priority Date: 03/27/2009
Status: Active Grant

First Claim

Patent Images

1. A method for text recognition from a video signal comprising:

forming a sequence of input images from the video signal, at least some of the input images including an image representing text;

combining the sequence of input images to form one or more combined images;

forming a plurality of processed images, including processing input images to form at least some of the processed images and processing the one or more combined images to form at least one of the processed images;

generating one or more text hypotheses for each of the plurality of processed images; and

determining a combined text output from a combination of the generated text hypotheses.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Multi-frame persistence of videotext is exploited to mitigate challenges posed by varying characteristics of videotext across frame instances to improve OCR techniques. In some examples, each frame of video is processed to form multiple binary images, and one or more text hypotheses is formed from each binary image. In some examples, one or more combined images are formed from multiple frames processed to form a binary image and a corresponding text hypothesis. The text hypotheses are combined to yield an overall text recognition output.

Citations

20 Claims

1. A method for text recognition from a video signal comprising:
- forming a sequence of input images from the video signal, at least some of the input images including an image representing text;
  
  combining the sequence of input images to form one or more combined images;
  
  forming a plurality of processed images, including processing input images to form at least some of the processed images and processing the one or more combined images to form at least one of the processed images;
  
  generating one or more text hypotheses for each of the plurality of processed images; and
  
  determining a combined text output from a combination of the generated text hypotheses.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The method of claim 1 further comprising:
    - accepting a sequence of video frames of the video signal; and
      
      accepting a specification of a bounding region and a time interface of an instance of text represented in the video signal, and forming a sequence of images from the video signal;
      
      wherein forming the sequence of images includes forming the images from the sequence of video frames according to the specification of the bounding region and the time interval.
  - 3. The method of claim 1 wherein combining the sequence of images includes registering images from the sequence of images and combining the registered images.
  - 4. The method of claim 1 wherein combining the sequence of images includes determining an extremum image.
  - 5. The method of claim 4 wherein determining the extremum image comprises determining an extremum intensity value at locations over input images.
  - 6. The method of claim 1 wherein forming the plurality of processed images includes forming binary images from the input images and the one or more combined images.
  - 7. The method of claim 6 wherein forming a binary image from an input image comprises comparing intensity values at locations in the input image with a threshold intensity.
  - 8. The method of claim 1 wherein forming the plurality of processed images includes, for each input image, forming a plurality of processed images, each of said processed images being formed using a different processing of the input image.
  - 9. The method of claim 1 wherein generating the one or more text hypotheses for each of the plurality of processed images includes applying a statistical character recognizer to each of the processed images.
  - 10. The method of claim 9 wherein forming the plurality of processed images includes, for each input image, forming a plurality of processed images, each of said processed images being formed using a different processing of the input image, and applying the statistical character recognizer to each of the processed images includes configuring the recognizer according to parameters matching the different processing applied to form the processed images.
  - 11. The method of claim 1 wherein generating the one or more text hypotheses for each of the plurality of processed images includes generating multiple best character sequences.
  - 12. The method of claim 1 wherein determining the combined text output includes aligning character sequence hypotheses of the generated text hypotheses.
  - 13. The method of claim 12 wherein aligning the character sequences includes applying a spatial limit to the alignment of characters of different sequences.
  - 14. The method of claim 1 wherein determining the combined text output from the combination of the generated text hypotheses includes forming a network representation of the text hypotheses and determining the combined text output using the network representation.

15. A method for text recognition from a video signal comprising:
- forming a sequence of input images from the video signal, at least some of the input images including an image representing text;
  
  combining the sequence of input images to form one or more combined images, wherein one of the combined images is an extremum image;
  
  forming a plurality of processed images, including processing input images to form at least some of the processed images and processing the one or more combined images to form at least one of the processed images;
  
  and including, for each input image, forming a plurality of processed images, each of said processed images being formed using a different processing of the input image, and applying a statistical character recognizer to each of the processed image includes configuring the recognizer according to parameters matching the different processing applied to form the processed images;
  
  generating one or more text hypotheses for each of the plurality of processed images; and
  
  determining a combined text output from a combination of the generated text hypotheses by forming a network representation of the text hypotheses and determining the combined text output using the network representation.

16. A system for text recognition from a video signal, the system comprising:
- an image combiner configured to accept a sequence of input images and provide a combined image formed from the input images;
  
  an image processor block, configured to accept the sequence of input images from the video signal and to accept the combined image from the image combiner;
  
  a text recognizer coupled to the image processor block for generating one or more text hypotheses for each of a plurality of processed images; and
  
  an output estimator coupled to the text recognizer for determining a combined text output from a combination of the generated text hypotheses.
- View Dependent Claims (17, 18)
- - 17. The system of claim 16 wherein the image combiner is configured to accept a sequence of input images and provide an extremum image that has extremum intensity values at locations over the input images.
  - 18. The system of claim 16 wherein the output estimator is coupled to the text recognizer for determining a combined text output from a combination of the generated text hypotheses by forming a network representation of the text hypotheses and determining the combined text output using the network representation.

19. A non-transitory computer-readable medium comprising instructions for causing a data processing system to:
- form a sequence of input images from a video signal, at least some of the input images including an image representing text;
  
  combine the sequence of input images to form one or more combined images;
  
  form a plurality of processed images, including processing input images to form at least some of the processed images and processing the one or more combined images to form at least one of the processed images;
  
  generate one or more text hypotheses for each of a plurality of processed images; and
  
  determine a combined text output from a combination of the generated text hypotheses.
- View Dependent Claims (20)
- - 20. The non-transitory computer-readable medium of claim 19, further comprising instructions for causing a data processing system to:
    - combine the sequence of input images to form one or more combined images, wherein one of the combined images is an extremum image that has extremum intensity values at locations over the input images;
      
      form a plurality of processed images, including processing input images to form at least some of the processed images and processing the one or more combined images to form at least one of the processed images;
      
      and including, for each input image, forming a plurality of processed images, each of said processed images being formed using a different processing of the input image, and applying a statistical character recognizer to each of the processed image includes configuring the recognizer according to parameters matching the different processing applied to form the processed images; and
      
      determine a combined text output from a combination of the generated text hypotheses by forming a network representation of the text hypotheses and determining the combined text output using the network representation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Raytheon BBN Technologies Corp. (Rtx Corporation)
Original Assignee
Raytheon BBN Technologies Corp. (Rtx Corporation)
Inventors
Prasad, Rohit, Natarajan, Premkumar, MacRostie, Ehry
Primary Examiner(s)
Patel, Kanjibhai

Application Number

US12/413,048
Publication Number

US 20100246961A1
Time in Patent Office

1,299 Days
Field of Search

382181-186, 382/228, 382/229, 382/231, 382/292, 382/305, 382/312, 704/251
US Class Current

382/181
CPC Class Codes

G06F 18/295   Markov models or related mo...

G06V 10/85   Markov-related models; Mark...

G06V 20/63   Scene text, e.g. street names

G06V 30/10   Character recognition

G06V 30/268   Lexical context

Multi-frame videotext recognition

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-frame videotext recognition

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links