Extracting textual information from a video sequence

US 6,587,586 B1
Filed: 06/12/1997
Issued: 07/01/2003
Est. Priority Date: 06/12/1997
Status: Expired due to Term

First Claim

Patent Images

1. An apparatus for extracting an image representing textual information from a video sequence, comprising:

a source of a video sequence having a plurality of frames, each containing an image of the textual information; and

a processor, coupled to the video sequence source, responsive to all of the plurality of frames, for generating a single array representing an image of the textual information, wherein the processor comprises;

a circuit, coupled to the video sequence source, for locating the textual information image in each of the plurality of frames and generating a stack of text image arrays, respectively corresponding to the plurality of frames, each containing an image which is substantially only of the textual information; and

a circuit, coupled to the locating circuit, and responsive to all of the arrays in the stack of text arrays, for extracting an image of the textual information into a single array, wherein the extracting circuit comprises;

circuitry, responsive to the stack of text arrays, for generating a stack of binary arrays, respectively corresponding to the stack of text arrays, each binary array containing binary data representing the textual information image in the corresponding text array;

circuitry, responsive to the stack of binary arrays, for performing a genetic algorithm search using the stack of binary arrays as an initial population to find an optimum binary image; and

providing the optimum binary image in the single array as the textual information image.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for extracting an image representing textual information from a video sequence includes the following steps. First, receiving a sequence of video frames, each including an image of textual information. Then, locating the textual information in each frame of the video sequence to form a stack of text arrays, each array containing data representing substantially only the textual information. Finally, extracting a single textual image array representing the image of the textual information from the stack of text arrays. Apparatus for extracting an image representing textual information from a video sequence includes a source of a video sequence having a plurality of frames, each containing an image of the textual information; and a processor, coupled to the video sequence source, responsive to all of the plurality of frames, for generating a single array representing an image of the textual information.

Citations

19 Claims

1. An apparatus for extracting an image representing textual information from a video sequence, comprising:
- a source of a video sequence having a plurality of frames, each containing an image of the textual information; and
  
  a processor, coupled to the video sequence source, responsive to all of the plurality of frames, for generating a single array representing an image of the textual information, wherein the processor comprises;
  
  a circuit, coupled to the video sequence source, for locating the textual information image in each of the plurality of frames and generating a stack of text image arrays, respectively corresponding to the plurality of frames, each containing an image which is substantially only of the textual information; and
  
  a circuit, coupled to the locating circuit, and responsive to all of the arrays in the stack of text arrays, for extracting an image of the textual information into a single array, wherein the extracting circuit comprises;
  
  circuitry, responsive to the stack of text arrays, for generating a stack of binary arrays, respectively corresponding to the stack of text arrays, each binary array containing binary data representing the textual information image in the corresponding text array;
  
  circuitry, responsive to the stack of binary arrays, for performing a genetic algorithm search using the stack of binary arrays as an initial population to find an optimum binary image; and
  
  providing the optimum binary image in the single array as the textual information image.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The apparatus of claim 1 wherein the genetic algorithm circuitry comprises circuitry for:
3. The apparatus of claim 1 wherein the genetic algorithm circuitry comprises circuitry for performing, as a part of the genetic algorithm, a locally greedy mutation function.
4. The apparatus of claim 1 wherein the genetic algorithm circuitry comprises circuitry for performing, as a part of the genetic algorithm, an elitist selection function.
5. The apparatus of claim 1, wherein the locating circuit comprises:
- circuitry, responsive to a first array in the stack of video arrays, for locating the textual information image in a first video frame;
  
  circuitry, responsive to the located textual information image in the first one of the stack of video arrays, for extracting features of the textual information image in the first one of the plurality of frames;
  
  circuitry, responsive to subsequent arrays in the stack of video arrays, for tracking the extracted features from frame to frame in subsequent ones of the plurality of frames to produce estimates of motion parameters; and
  
  circuitry, responsive to the estimated motion parameters, for correcting for perspective distortion in the plurality of frames and produce the stack of arrays containing respective images which is substantially only of the textual information.
6. The apparatus of claim 1, further comprising:
- a frame store, coupled between the video sequence source and the locating circuit; and
  
  an array stack memory, coupled between the locating circuit and the extracting circuit.
7. The apparatus of claim 1, further comprising an optical character recognition circuit, coupled to the extracting circuit, for generating computer readable data representing the textual information.
8. The apparatus of claim 1, further comprising a digitizer, coupled between the video signal source and the processor, for generating a stack of arrays, respectively corresponding to the plurality of frames, each array containing data representing the image of the textual information.
9. The apparatus of claim 1, wherein the processor comprises optical character recognition circuitry responsive to the textual information image array for generating computer readable data representing the textual information.

10. Apparatus for extracting an image representing textual information from a video sequence, comprising:
- a source of a video sequence having a plurality of frames, each containing an image of the textual information; and
  
  a processor, coupled to the video sequence source, responsive to all of the plurality of frames, for generating a single array representing an image of the textual information;
  
  wherein the processor comprises;
  
  a circuit, coupled to the video sequence source, for locating the textual information image in each of the plurality of frames and generating a stack of text image arrays, respectively corresponding to the plurality of frames, each containing an image which is substantially only of the textual information; and
  
  a circuit, coupled to the locating circuit, and responsive to all of the arrays in the stack of text arrays, for extracting an image of the textual information into a single array;
  
  wherein the extracting circuit comprises;
  
  circuitry, responsive to the stack of text arrays, for generating a stack of binary arrays, respectively corresponding to the stack of text arrays, each binary array containing binary data representing the textual information image in the corresponding text array;
  
  circuitry, responsive to the stack of binary arrays, for performing a genetic algorithm search using the stack of binary arrays as an initial population to find an optimum binary image; and
  
  providing the optimum binary image in the single array as the textual information image wherein the genetic algorithm circuitry comprises circuitry for;
  
  selecting individual arrays from the stack of binary arrays, from a population which will survive to the next generation according to the relative desirability of the individual arrays;
  
  crossing-over random pairs of selected individual arrays with a probability χ
  
  ;
  
  mutating random selected individual arrays with a probability μ
  
  ₁wherein the selected, crossed-over and mutated individual arrays form a stack of binary arrays representing a new generation; and
  
  repeating the selecting, crossing-over and mutating steps; and
  
  wherein the selecting circuitry comprises circuitry for calculating the desirability g(h¹) of an individual array h¹according to the equation;
  
  $g (h^{l}) = \sum_{c \in C} V_{c} (z) + \sum_{j = 1}^{n} \frac{{ z - h^{j} }^{2}}{2 σ^{2}}$ where z is an estimate of the textual information image, V_c(z) is the clique energy function, and σ
  
  is the variance.

11. Apparatus for extracting an image representing textual information from a video sequence, comprising:
- a source of a video sequence having a plurality of frames, each containing an image of the textual information; and
  
  a processor, coupled to the video sequence source, responsive to all of the plurality of frames, for generating a single array representing an image of the textual information;
  
  wherein the processor comprises;
  
  a circuit, coupled to the video sequence source, for locating the textual information image in each of the plurality of frames and generating a stack of text image arrays, respectively corresponding to the plurality of frames, each containing an image which is substantially only of the textual information; and
  
  a circuit, coupled to the locating circuit, and responsive to all of the arrays in the stack of text arrays, for extracting an image of the textual information into a single array;
  
  wherein the extracting circuit comprises;
  
  circuitry, responsive to the stack of text arrays, for generating a stack of binary arrays, respectively corresponding to the stack of text arrays, each binary array containing binary data representing the textual information image in the corresponding text array;
  
  circuitry, responsive to the stack of binary arrays, for performing a genetic algorithm search using the stack of binary arrays as an initial population to find an optimum binary image; and
  
  providing the optimum binary image in the single array as the textual information image; and
  
  wherein;
  
  each array in the stack of binary arrays is arranged as a plurality of rows, each row having a plurality of pixels, each pixel having a binary value;
  
  the stack of binary arrays is modeled on a Markov random field having a second order neighborhood, and a single non-zero clique consisting of four pixels arranged in a square.
- View Dependent Claims (12)
- - 12. The apparatus of claim 11 wherein the value of the clique energy function for the single non-zero clique comprises:

13. A method for extracting an image representing textual information from a video sequence, comprising the steps of:
- receiving a sequence of video frames, each including an image of the textual information;
  
  locating the textual information in each frame of the video sequence to form a stack of text arrays, each array containing data representing substantially only the textual information;
  
  extracting a single textual image array representing the image of the textual information from the stack of text arrays, wherein the extracting step comprises the steps of;
  
  generating a stack of binary arrays, respectively corresponding to sequence of video frames;
  
  extracting the single textual image array from the stack of binary arrays by performing a genetic algorithm search using the stack of binary arrays as an initial population to find an optimum binary image; and
  
  providing the optimum binary image in the single array as the textual information image.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The method of claim 13 further comprising, after the extracting step, the step of recognizing the characters in the textual information image and generating computer readable data representing the textual information.
  - 15. The method of claim 13 wherein the locating step comprises the steps of:
16. The method of claim 13 wherein the genetic algorithm comprises the steps of:
- using the stack of binary arrays as an old generation;
  
  selecting individual arrays to survive to a next generation according to the relative desirability of the individual array;
  
  crossing-over random pairs of selected arrays with probability χ
  
  ;
  
  mutating random selected arrays with probability μ
  
  ₁to form a new generation;
  
  repeating the selecting, crossing-over, and mutating steps with the new generation.
17. The method of claim 16 wherein the mutating step comprises the step of using a locally greedy mutation operation.
18. The method of claim 16 wherein the selecting step comprises the step of using an elitist selection operation.

19. A method for extracting an image representing textual information from a video sequence, comprising the steps of:
- receiving a sequence of video frames, each including an image of textual information;
  
  locating the textual information in each frame of the video sequence to form a stack of text arrays, each array containing data representing substantially only the textual information; and
  
  extracting a single textual image array representing the image of the textual information from the stack of text arrays;
  
  wherein the extracting step comprises the steps of;
  
  generating a stack of binary arrays, respectively corresponding to sequence of video frames;
  
  extracting the single textual image array from the stack of binary arrays using a genetic algorithm;
  
  wherein the genetic algorithm comprises the steps of;
  
  using the stack of binary arrays as an old generation;
  
  selecting individual arrays to survive to a next generation according to the relative desirability of the individual array;
  
  crossing-over random pairs of selected arrays with probability χ
  
  ;
  
  mutating random selected arrays with probability μ
  
  ₁to form a new generation;
  
  repeating the selecting, crossing-over, and mutating steps with the new generation; and
  
  wherein the selecting step comprises the step of calculating the desirability g(h¹) of an individual array h¹according to the equation;
  
  $g (h^{l}) = \sum_{c \in C} V_{c} (z) + \sum_{j = 1}^{n} \frac{{ z - h^{j} }^{2}}{2 σ^{2}}$ where z is an estimate of the textual information array, V(z) is the clique energy function, and σ
  
  is the variance.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Siemens Medical Solutions USA Incorporated (Siemens AG)
Original Assignee
Siemens Corporate Research Incorporated (Siemens AG)
Inventors
Huang, Qian, Cui, Yuntao
Primary Examiner(s)
Dastouri, Mehrdad

Application Number

US08/999,903
Time in Patent Office

2,210 Days
Field of Search

382/176, 382/190, 382/236, 382/192, 382/195, 382/205, 382/237, 706/12, 706/13, 706/14, 345/501, 345/520
US Class Current

382/176
CPC Class Codes

G06F 18/295   Markov models or related mo...

G06V 20/62   Text, e.g. of license plate...

G06V 20/625   License plates

Extracting textual information from a video sequence

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Extracting textual information from a video sequence

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links