Extracting textual information from a video sequence
First Claim
1. An apparatus for extracting an image representing textual information from a video sequence, comprising:
- a source of a video sequence having a plurality of frames, each containing an image of the textual information; and
a processor, coupled to the video sequence source, responsive to all of the plurality of frames, for generating a single array representing an image of the textual information, wherein the processor comprises;
a circuit, coupled to the video sequence source, for locating the textual information image in each of the plurality of frames and generating a stack of text image arrays, respectively corresponding to the plurality of frames, each containing an image which is substantially only of the textual information; and
a circuit, coupled to the locating circuit, and responsive to all of the arrays in the stack of text arrays, for extracting an image of the textual information into a single array, wherein the extracting circuit comprises;
circuitry, responsive to the stack of text arrays, for generating a stack of binary arrays, respectively corresponding to the stack of text arrays, each binary array containing binary data representing the textual information image in the corresponding text array;
circuitry, responsive to the stack of binary arrays, for performing a genetic algorithm search using the stack of binary arrays as an initial population to find an optimum binary image; and
providing the optimum binary image in the single array as the textual information image.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for extracting an image representing textual information from a video sequence includes the following steps. First, receiving a sequence of video frames, each including an image of textual information. Then, locating the textual information in each frame of the video sequence to form a stack of text arrays, each array containing data representing substantially only the textual information. Finally, extracting a single textual image array representing the image of the textual information from the stack of text arrays. Apparatus for extracting an image representing textual information from a video sequence includes a source of a video sequence having a plurality of frames, each containing an image of the textual information; and a processor, coupled to the video sequence source, responsive to all of the plurality of frames, for generating a single array representing an image of the textual information.
-
Citations
19 Claims
-
1. An apparatus for extracting an image representing textual information from a video sequence, comprising:
-
a source of a video sequence having a plurality of frames, each containing an image of the textual information; and
a processor, coupled to the video sequence source, responsive to all of the plurality of frames, for generating a single array representing an image of the textual information, wherein the processor comprises;
a circuit, coupled to the video sequence source, for locating the textual information image in each of the plurality of frames and generating a stack of text image arrays, respectively corresponding to the plurality of frames, each containing an image which is substantially only of the textual information; and
a circuit, coupled to the locating circuit, and responsive to all of the arrays in the stack of text arrays, for extracting an image of the textual information into a single array, wherein the extracting circuit comprises;
circuitry, responsive to the stack of text arrays, for generating a stack of binary arrays, respectively corresponding to the stack of text arrays, each binary array containing binary data representing the textual information image in the corresponding text array;
circuitry, responsive to the stack of binary arrays, for performing a genetic algorithm search using the stack of binary arrays as an initial population to find an optimum binary image; and
providing the optimum binary image in the single array as the textual information image. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
selecting individual arrays from the stack of binary arrays, from a population which will survive to the next generation according to the relative desirability of the individual arrays;
crossing-over random pairs of selected individual arrays with a probability χ
;
mutating random selected individual arrays with a probability μ
1, wherein the selected, crossed-over and mutated individual arrays form a stack of binary arrays representing a new generation; and
repeating the selecting, crossing-over and mutating steps.
-
-
3. The apparatus of claim 1 wherein the genetic algorithm circuitry comprises circuitry for performing, as a part of the genetic algorithm, a locally greedy mutation function.
-
4. The apparatus of claim 1 wherein the genetic algorithm circuitry comprises circuitry for performing, as a part of the genetic algorithm, an elitist selection function.
-
5. The apparatus of claim 1, wherein the locating circuit comprises:
-
circuitry, responsive to a first array in the stack of video arrays, for locating the textual information image in a first video frame;
circuitry, responsive to the located textual information image in the first one of the stack of video arrays, for extracting features of the textual information image in the first one of the plurality of frames;
circuitry, responsive to subsequent arrays in the stack of video arrays, for tracking the extracted features from frame to frame in subsequent ones of the plurality of frames to produce estimates of motion parameters; and
circuitry, responsive to the estimated motion parameters, for correcting for perspective distortion in the plurality of frames and produce the stack of arrays containing respective images which is substantially only of the textual information.
-
-
6. The apparatus of claim 1, further comprising:
-
a frame store, coupled between the video sequence source and the locating circuit; and
an array stack memory, coupled between the locating circuit and the extracting circuit.
-
-
7. The apparatus of claim 1, further comprising an optical character recognition circuit, coupled to the extracting circuit, for generating computer readable data representing the textual information.
-
8. The apparatus of claim 1, further comprising a digitizer, coupled between the video signal source and the processor, for generating a stack of arrays, respectively corresponding to the plurality of frames, each array containing data representing the image of the textual information.
-
9. The apparatus of claim 1, wherein the processor comprises optical character recognition circuitry responsive to the textual information image array for generating computer readable data representing the textual information.
-
10. Apparatus for extracting an image representing textual information from a video sequence, comprising:
-
a source of a video sequence having a plurality of frames, each containing an image of the textual information; and
a processor, coupled to the video sequence source, responsive to all of the plurality of frames, for generating a single array representing an image of the textual information;
wherein the processor comprises;
a circuit, coupled to the video sequence source, for locating the textual information image in each of the plurality of frames and generating a stack of text image arrays, respectively corresponding to the plurality of frames, each containing an image which is substantially only of the textual information; and
a circuit, coupled to the locating circuit, and responsive to all of the arrays in the stack of text arrays, for extracting an image of the textual information into a single array;
wherein the extracting circuit comprises;
circuitry, responsive to the stack of text arrays, for generating a stack of binary arrays, respectively corresponding to the stack of text arrays, each binary array containing binary data representing the textual information image in the corresponding text array;
circuitry, responsive to the stack of binary arrays, for performing a genetic algorithm search using the stack of binary arrays as an initial population to find an optimum binary image; and
providing the optimum binary image in the single array as the textual information imagewherein the genetic algorithm circuitry comprises circuitry for;
selecting individual arrays from the stack of binary arrays, from a population which will survive to the next generation according to the relative desirability of the individual arrays;
crossing-over random pairs of selected individual arrays with a probability χ
;
mutating random selected individual arrays with a probability μ
1 wherein the selected, crossed-over and mutated individual arrays form a stack of binary arrays representing a new generation; and
repeating the selecting, crossing-over and mutating steps; and
wherein the selecting circuitry comprises circuitry for calculating the desirability g(h1) of an individual array h1 according to the equation;
where z is an estimate of the textual information image, Vc(z) is the clique energy function, and σ
is the variance.
-
-
11. Apparatus for extracting an image representing textual information from a video sequence, comprising:
-
a source of a video sequence having a plurality of frames, each containing an image of the textual information; and
a processor, coupled to the video sequence source, responsive to all of the plurality of frames, for generating a single array representing an image of the textual information;
wherein the processor comprises;
a circuit, coupled to the video sequence source, for locating the textual information image in each of the plurality of frames and generating a stack of text image arrays, respectively corresponding to the plurality of frames, each containing an image which is substantially only of the textual information; and
a circuit, coupled to the locating circuit, and responsive to all of the arrays in the stack of text arrays, for extracting an image of the textual information into a single array;
wherein the extracting circuit comprises;
circuitry, responsive to the stack of text arrays, for generating a stack of binary arrays, respectively corresponding to the stack of text arrays, each binary array containing binary data representing the textual information image in the corresponding text array;
circuitry, responsive to the stack of binary arrays, for performing a genetic algorithm search using the stack of binary arrays as an initial population to find an optimum binary image; and
providing the optimum binary image in the single array as the textual information image; and
wherein;
each array in the stack of binary arrays is arranged as a plurality of rows, each row having a plurality of pixels, each pixel having a binary value;
the stack of binary arrays is modeled on a Markov random field having a second order neighborhood, and a single non-zero clique consisting of four pixels arranged in a square. - View Dependent Claims (12)
the value 1 when all pixels in the clique have the same value;
the value 3 when two adjacent pixels have the same value, and the remaining pixels have the other value;
the value 6 when one pixel has a different value than value of the remaining pixels; and
the value 18 when diagonal pixels have the same value, and the remaining pixels have the other value.
-
-
13. A method for extracting an image representing textual information from a video sequence, comprising the steps of:
-
receiving a sequence of video frames, each including an image of the textual information;
locating the textual information in each frame of the video sequence to form a stack of text arrays, each array containing data representing substantially only the textual information;
extracting a single textual image array representing the image of the textual information from the stack of text arrays, wherein the extracting step comprises the steps of;
generating a stack of binary arrays, respectively corresponding to sequence of video frames;
extracting the single textual image array from the stack of binary arrays by performing a genetic algorithm search using the stack of binary arrays as an initial population to find an optimum binary image; and
providing the optimum binary image in the single array as the textual information image. - View Dependent Claims (14, 15, 16, 17, 18)
locating the textual information in a first frame of the video sequence;
extracting features of the textual information in the first frame of the video sequence;
tracking the features of the textual information in subsequent frames of the video sequence;
estimating motion parameters from the tracked features; and
correcting perspective distortion in the textual information in each of the frames of the video sequence to generate the stack of text arrays.
-
-
16. The method of claim 13 wherein the genetic algorithm comprises the steps of:
-
using the stack of binary arrays as an old generation;
selecting individual arrays to survive to a next generation according to the relative desirability of the individual array;
crossing-over random pairs of selected arrays with probability χ
;
mutating random selected arrays with probability μ
1 to form a new generation;
repeating the selecting, crossing-over, and mutating steps with the new generation.
-
-
17. The method of claim 16 wherein the mutating step comprises the step of using a locally greedy mutation operation.
-
18. The method of claim 16 wherein the selecting step comprises the step of using an elitist selection operation.
-
19. A method for extracting an image representing textual information from a video sequence, comprising the steps of:
-
receiving a sequence of video frames, each including an image of textual information;
locating the textual information in each frame of the video sequence to form a stack of text arrays, each array containing data representing substantially only the textual information; and
extracting a single textual image array representing the image of the textual information from the stack of text arrays;
wherein the extracting step comprises the steps of;
generating a stack of binary arrays, respectively corresponding to sequence of video frames;
extracting the single textual image array from the stack of binary arrays using a genetic algorithm;
wherein the genetic algorithm comprises the steps of;
using the stack of binary arrays as an old generation;
selecting individual arrays to survive to a next generation according to the relative desirability of the individual array;
crossing-over random pairs of selected arrays with probability χ
;
mutating random selected arrays with probability μ
1 to form a new generation;
repeating the selecting, crossing-over, and mutating steps with the new generation; and
wherein the selecting step comprises the step of calculating the desirability g(h1) of an individual array h1 according to the equation;
where z is an estimate of the textual information array, V(z) is the clique energy function, and σ
is the variance.
-
Specification