Method and apparatus for detecting and interpreting textual captions in digital video signals

US 6,101,274 A
Filed: 06/02/1997
Issued: 08/08/2000
Est. Priority Date: 12/28/1994
Status: Expired due to Term

First Claim

Patent Images

1. A computer-implemented method for the identification and interpretation of text captions in an encoded video stream of digital video signals, said method comprising:

sampling by selecting frames for video analysis;

decoding by converting each of said frames selected into a digitized color image;

performing edge detection for generating a gray scale image;

binarizing by converting said gray scale image into a bi-level image by means of a thresholding operation;

compressing groups of consecutive pixel values in said binary image;

mapping said consecutive pixel values into a binary value; and

separating groups of connected pixels and determining whether they are likely to be part of a text region in the image or not.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented method for the identification and interpretation of text captions in an encoded video stream of digital video signals comprises sampling by selecting frames for video analysis, decoding by converting each of frames selected into a digitized color image, performing edge detection for generating a grey scale image, binarizing by converting the grey scale image into a bi-level image by means of a thresholding operation, compressing groups of consecutive pixel values in the binary image, mapping the consecutive pixel values into a binary value, and separating groups of connected pixels and determining whether they are likely to be part of a text region in the image or not.

108 Citations

24 Claims

1. A computer-implemented method for the identification and interpretation of text captions in an encoded video stream of digital video signals, said method comprising:
- sampling by selecting frames for video analysis;
  
  decoding by converting each of said frames selected into a digitized color image;
  
  performing edge detection for generating a gray scale image;
  
  binarizing by converting said gray scale image into a bi-level image by means of a thresholding operation;
  
  compressing groups of consecutive pixel values in said binary image;
  
  mapping said consecutive pixel values into a binary value; and
  
  separating groups of connected pixels and determining whether they are likely to be part of a text region in the image or not.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. A computer-implemented method for the identification and interpretation of text captions as recited in claim 1, wherein said sampling is at a sampling rate fixed at 1 frame per N, where N is the number of consecutive frames in which the same caption is expected to appear.
  - 3. A computer-implemented method for the identification and interpretation of text captions as recited in claim 1, wherein said decoding uses one of JPEG encoding and MPEG.
  - 4. A computer-implemented method for the identification and interpretation of text captions as recited in claim 3, wherein the format of the resulting color image is a 24-bit RGB format.
  - 5. A computer-implemented method for the identification and interpretation of text captions as recited in claim 1, wherein said compressing groups is performed eight at a time.
  - 6. A computer-implemented method for the identification and interpretation of text captions as recited in claim 1, wherein said mapping of said pixel values is performed by means of a test wherein during this step a binary edge image of a frame, which size is defined to be WIDTH×
    - HEIGHT, is converted into an image of size WIDTH8×
      
      HEIGHT by compressing each byte in the original image (8 continuous pixels) into a binary value in accordance with predetermined criteria.
  - 7. A computer-implemented method for the identification and interpretation of text captions as recited in claim 6, wherein said criteria comprisea) 2<
    - # of white pixels<
      
      6; and
      
      b) at least 2 of the 4-connected neighbors are 1.
  - 8. A computer-implemented method for the identification and interpretation of text captions as recited in claim 1, including the step of taking a binary image as input and producing a table of connected-components formatted as follows:
    - space="preserve" listing-type="tabular">______________________________________ Component ID EnclosingRectangle Dimensions Density ______________________________________
      where Component ID is an integer, Enclosing rectangle are the coordinates of the smallest rectangle containing all the pixels in the component (minX, minY, maxX, maxY), dimensions are measurements of the width, height and area of the enclosing rectangle, and density is the ratio of black pixels, which are associated with edges, in the component.
  - 9. A computer-implemented method for the identification and interpretation of text captions as recited in claim 8, comprising a step of determining whether a connected component is likely to contain edges associated with text or not, said step comprising the steps of:
    - (a) a geometric test, comparing the width, height, and area of said component against predefined minima and maxima, and discarding the component if any of the conditions is violated, and(b) a content test applied to connected components that pass the geometric test, comparing the density of said component against predefined upper and lower bounds, as verifying that a minimum number of vertical white runs, corresponding to the gaps that occur between letters, exist.
  - 10. A computer-implemented method for the identification and interpretation of text captions as recited in claim 9, comprising the steps of:
    - projecting black pixels contained in connected components that passed said geometric and content tests into the Y-axis of the image, thereby producing a projection pattern; and
      
      testing said resulting projection pattern to determine if its vertical runs, defined as sequences of consecutive lines having counts greater than zero, exceed a minimum height of the characters being sought in the caption.
  - 11. A computer-implemented method for the identification and interpretation of text captions as recited in claim 10, including the step of defining said binary image, and its corresponding video frame, as having a caption if and only if it has at least one run satisfying said minimum text height condition.
  - 12. A computer-implemented method for the identification and interpretation of text captions as recited in claim 11, wherein, when said binary image and its corresponding video frame is defined as having a caption, confirming the results by the steps of:
    - defining N to be a frame determined to have a caption;
      
      applying said decoding, edge detection and binarization steps to frames N-D/2 and N+D/2 wherein D is the minimum number of contiguous frames a caption is expected to appear;
      
      combining resulting images with a binary image of N using an AND operation so as to reesult in two new binary images in which some of the edges associated with the individual frames have been removed, but those associated with text remain;
      
      applying said compression, projecting black pixels, and testing projection pattern steps to each of said two new binary images; and
      
      determining a frame to have a caption, if and only if, either one, or both of said two new images is determined to have a caption.
  - 13. A computer-implemented method for the identification and interpretation of text captions as recited in claim 1, including the step of applying optical character recognition (OCR) to a portion of located text so as to interpret said text.

14. A computer-implemented method for the identification and interpretation of text captions in a video stream wherein the frame sequence is compressed, comprising the steps of:
- determining whether the frame number divided by a predetermined number N is an integer, discarding non-integers;
  
  decoding compressed frames so as to result in uncompressed frames;
  
  detecting edges so as to derive a corresponding gray scale image;
  
  binarizing said gray scale image so as to derive a binary image;
  
  compressing said binary image so as to derive a compressed binary image; and
  
  performing a connected component analysis.
- View Dependent Claims (15, 16, 17, 18, 19, 20)
- - 15. A computer-implemented method for the identification and interpretation of text captions as recited in claim 14, wherein said connected components analysis is carried out by computing connected components using a standard 4-neighbor connectivity text.
  - 16. A computer-implemented method for the identification and interpretation of text captions as recited in claim 15, wherein each said computed connected component is subjected to two sets of tests involving its geometric properties and contents.
  - 17. A computer-implemented method for the identification and interpretation of text captions as recited in claim 16, wherein said geometric tests involve minimum and maximum boundaries on a respective connected component'"'"'s width, height and area.
  - 18. A computer-implemented method for the identification and interpretation of text captions as recited in claim 16, wherein said content tests, applied to an area in said binary image following edge detection corresponding to said connected component, include upper and lower boundaries on the proportion of black pixels contains therein, and a threshold on the number of vertical zero-runs, defined as collections of one or more columns in which no black pixels occur.
  - 19. A computer-implemented method for the identification and interpretation of text captions as recited in claim 18, comprising the steps of:
    - separating out connected components that passed said tests; and
      
      projecting values of corresponding pixels in said binary image following edge detection into the vertical axis of the image so as to result in a projection pattern.
  - 20. A computer-implemented method for the identification and interpretation of text captions as recited in claim 19, comprising the steps oftesting said projection pattern to determine if it contains runs that exceed a given threshold and thereby determine if a caption is present.

21. A computer-implemented method for the identification and interpretation of text captions in an encoded video stream of digital video signals, said method comprising:
- sampling by selecting frames for video analysis;
  
  decoding by converting each of frames selected into a digitized color image;
  
  separating each said digitized color image into three color images corresponding to three color planes;
  
  performing edge detection on each of said color planes for generating a respective gray scale image for each of said color planes;
  
  applying a thresholding image to each of said gray scale images so as to produce three respective binary edge images;
  
  combining said three binary edge images to obtain a single combined binary edge image;
  
  compressing groups of consecutive pixel values in said combined binary image;
  
  mapping said consecutive pixel values into a binary value; and
  
  separating groups of connected pixels and determining whether they are likely to be part of a text region in the image or not.
- View Dependent Claims (22, 23)
- - 22. A computer-implemented method for the identification and interpretation of text captions as recited in claim 21, wherein said digitized color image is separated into three 8-bit color images corresponding to said three color planes.
  - 23. A computer-implemented method for the identification and interpretation of text captions as recited in claim 22, wherein said three color planes are respectively red, green, and blue (RGB).

24. A computer-implemented, method for the identification and interpretation of text captions which are embedded in one or more of a plurality of contiguous frames of encoded digital video signals, said method comprising:
- sampling by selecting one or more of said frames for video analysis, said sampling being at a sampling rate fixed at 1 frame per N, where N is the number of consecutive frames in which a given text caption is expected to appear;
  
  decoding by converting each of said frames selected into a digitized color image;
  
  performing edge detection for generating a gray scale image;
  
  binarizing by converting said gray scale image into a bi-level image by means of a thresholding operation;
  
  compressing groups of consecutive pixel values in said binary image;
  
  mapping said consecutive pixel values into a binary value; and
  
  separating groups of connected pixels and determining whether they are likely to be part of a text region embedded in the image or not.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Siemens AG
Original Assignee
Siemens Corporate Research Incorporated (Siemens AG)
Inventors
Pizano, Arturo, Arman, Farshid, Benson, Daniel Conrad, Depommier, Remi
Primary Examiner(s)
Johns, Andrew W.

Application Number

US08/866,970
Time in Patent Office

1,163 Days
Field of Search

382/174, 382/176, 382/180, 382/199, 382/200, 382/256, 382/257, 382/270, 382/292
US Class Current

382/176
CPC Class Codes

G06V 20/635 Overlay text, e.g. embedded...

G06V 30/10 Character recognition

Method and apparatus for detecting and interpreting textual captions in digital video signals

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

108 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for detecting and interpreting textual captions in digital video signals

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

108 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links