Multi-frame videotext recognition
First Claim
Patent Images
1. A method for text recognition from a video signal comprising:
- forming a sequence of input images from the video signal, at least some of the input images including an image representing text;
combining the sequence of input images to form one or more combined images;
forming a plurality of processed images, including processing input images to form at least some of the processed images and processing the one or more combined images to form at least one of the processed images;
generating one or more text hypotheses for each of the plurality of processed images; and
determining a combined text output from a combination of the generated text hypotheses.
3 Assignments
0 Petitions
Accused Products
Abstract
Multi-frame persistence of videotext is exploited to mitigate challenges posed by varying characteristics of videotext across frame instances to improve OCR techniques. In some examples, each frame of video is processed to form multiple binary images, and one or more text hypotheses is formed from each binary image. In some examples, one or more combined images are formed from multiple frames processed to form a binary image and a corresponding text hypothesis. The text hypotheses are combined to yield an overall text recognition output.
-
Citations
20 Claims
-
1. A method for text recognition from a video signal comprising:
-
forming a sequence of input images from the video signal, at least some of the input images including an image representing text; combining the sequence of input images to form one or more combined images; forming a plurality of processed images, including processing input images to form at least some of the processed images and processing the one or more combined images to form at least one of the processed images; generating one or more text hypotheses for each of the plurality of processed images; and determining a combined text output from a combination of the generated text hypotheses. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A method for text recognition from a video signal comprising:
-
forming a sequence of input images from the video signal, at least some of the input images including an image representing text; combining the sequence of input images to form one or more combined images, wherein one of the combined images is an extremum image; forming a plurality of processed images, including processing input images to form at least some of the processed images and processing the one or more combined images to form at least one of the processed images; and including, for each input image, forming a plurality of processed images, each of said processed images being formed using a different processing of the input image, and applying a statistical character recognizer to each of the processed image includes configuring the recognizer according to parameters matching the different processing applied to form the processed images; generating one or more text hypotheses for each of the plurality of processed images; and determining a combined text output from a combination of the generated text hypotheses by forming a network representation of the text hypotheses and determining the combined text output using the network representation.
-
-
16. A system for text recognition from a video signal, the system comprising:
-
an image combiner configured to accept a sequence of input images and provide a combined image formed from the input images; an image processor block, configured to accept the sequence of input images from the video signal and to accept the combined image from the image combiner; a text recognizer coupled to the image processor block for generating one or more text hypotheses for each of a plurality of processed images; and an output estimator coupled to the text recognizer for determining a combined text output from a combination of the generated text hypotheses. - View Dependent Claims (17, 18)
-
-
19. A non-transitory computer-readable medium comprising instructions for causing a data processing system to:
-
form a sequence of input images from a video signal, at least some of the input images including an image representing text; combine the sequence of input images to form one or more combined images; form a plurality of processed images, including processing input images to form at least some of the processed images and processing the one or more combined images to form at least one of the processed images; generate one or more text hypotheses for each of a plurality of processed images; and determine a combined text output from a combination of the generated text hypotheses. - View Dependent Claims (20)
-
Specification