Simultaneous tracking and text recognition in video frames
First Claim
1. A system, comprising:
- a text recognition component configured for recognition of text on a sequence of video frames, the text recognition component configured to receive a selected frame of the sequence of video frames and perform text recognition processing of the selected frame to output a selected frame result;
a tracker component configured to select a keyframe from the sequence of video frames based on stability criteria applied to incoming frames and to establish a reference coordinate system relative to the selected keyframe, the selected frame result mapped back to the reference coordinate system of the keyframe, the tracker component configured to apply keyframe coordinates to subsequent video frames to enable accumulation of best results for text recognition rendering and viewing; and
a microprocessor configured to execute computer-executable instructions associated with at least one of the text recognition component or the tracker component.
3 Assignments
0 Petitions
Accused Products
Abstract
Architecture that enables optical character recognition (OCR) of text in video frames at the rate at which the frames are received. Additionally, conflation is performed on multiple text recognition results in the frame sequence. The architecture comprises an OCR text recognition engine and a tracker system; the tracker system establishes a common coordinate system in which OCR results from different frames may be compared and/or combined. From a set of sequential video frames, a keyframe is chosen from which the reference coordinate system is established. An estimated transformation from keyframe coordinates to subsequent video frames is computed using the tracker system. When text recognition is completed for any subsequent frame, the result coordinates can be related to the keyframe using the inverse transformation from the processed frame to the reference keyframe. The results can be rendered for viewing as the results are obtained.
-
Citations
20 Claims
-
1. A system, comprising:
-
a text recognition component configured for recognition of text on a sequence of video frames, the text recognition component configured to receive a selected frame of the sequence of video frames and perform text recognition processing of the selected frame to output a selected frame result; a tracker component configured to select a keyframe from the sequence of video frames based on stability criteria applied to incoming frames and to establish a reference coordinate system relative to the selected keyframe, the selected frame result mapped back to the reference coordinate system of the keyframe, the tracker component configured to apply keyframe coordinates to subsequent video frames to enable accumulation of best results for text recognition rendering and viewing; and a microprocessor configured to execute computer-executable instructions associated with at least one of the text recognition component or the tracker component. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method performed by a computer system executing machine-readable instructions in a hardware memory, the method comprising acts of:
-
receiving a selected frame of a sequence of video frames for text recognition processing; choosing a keyframe from the sequence of video frames based on an application of stability criteria to incoming images; establishing a reference coordinate system relative to the keyframe for applying keyframe coordinates to subsequent video frames; recognition processing the selected frame to output a selected frame result; computing an estimated transformation between the keyframe and the selected frame result based on the reference coordinate system to create a keyframe result; storing the keyframe result of the selected frame for presentation to enable accumulation of best results for rendering and viewing; and configuring at least one processor to perform the acts of receiving, choosing, establishing, recognition processing, computing, and storing. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A method performed by a computer system executing machine-readable instructions in a hardware memory, the method comprising acts of:
-
selecting a keyframe based on an application of stability criteria to incoming images; establishing a common coordinate system and a transformation based on the keyframe that relate subsequent video frames of a sequence of video frames to coordinates of the keyframe; concurrently with the act of establishing, performing text recognition processing of the video frames to compute frame text results; relating the frame text results back to the coordinates of the keyframe using the transformation; conflating the frame text results to determine an optimum frame text result for presentation; and storing the keyframe result of the selected frame for presentation to enable accumulation of best results for rendering and viewing; and configuring at least one processor to perform the acts of selecting, establishing, performing, relating, and conflating. - View Dependent Claims (17, 18, 19, 20)
-
Specification