Scheme for detecting captions in coded video data without decoding coded video data

US 6,243,419 B1
Filed: 05/27/1997
Issued: 06/05/2001
Est. Priority Date: 05/27/1996
Status: Expired due to Fees

First Claim

Patent Images

1. A method for detecting a caption region from video data coded by using a combination of predictive coding and motion compensation, comprising the steps of:

judging whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not; and

detecting a region in the video data at which pixels/blocks judged by the judging step as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, as a caption region;

wherein the detecting step includes the steps of;

counting a frequency of appearance of a pixel/block which is judged by the judging step as being coded by using inter-frame correlation without using motion compensation, at each pixel/block position of a frame over a prescribed counting period;

selecting the caption region by comparing the frequency of appearance counted by the counting step with a prescribed threshold value;

forming a two-dimensional counting matrix indicating the frequency of appearance at each pixel/block position as counted by the counting step; and

producing a projection histogram by projecting the counting matrix into at least one direction defining the counting matrix;

wherein the producing step obtains a first projection histogram by projecting the counting matrix into a first direction, determines a first action along the first direction in which the frequency of appearance as indicated by the first projection histogram is greater than a first prescribed threshold value, and obtains the projection histogram by projecting the first projection histogram into a second direction within the first section; and

wherein the selecting step compares the frequency of appearance as indicated by the projection histogram with the prescribed threshold value, and determines a second section along the second direction in which the frequency of appearance as indicated by the projection histogram is greater than the prescribed threshold value, and selects those pixels/blocks which are within the first section and the second section as the caption region.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A video caption detection scheme capable of detecting captions from the coded video data which are coded by using a combination of predictive coding and motion compensation, without requiring the decoding of coded video data into frame images. In this video caption detection scheme, whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not is judged. Then, a region in the video data at which pixels/blocks that is judged as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, is detected as a caption region. The detection can be realized by counting a frequency of appearance of a pixel/block which is judged as being coded by using inter-frame correlation without using motion compensation, at each pixel/block position of a frame over a prescribed counting period, and then comparing the counted frequency of appearance with a prescribed threshold value.

92 Citations

View as Search Results

19 Claims

1. A method for detecting a caption region from video data coded by using a combination of predictive coding and motion compensation, comprising the steps of:
- judging whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not; and
  
  detecting a region in the video data at which pixels/blocks judged by the judging step as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, as a caption region;
  
  wherein the detecting step includes the steps of;
  
  counting a frequency of appearance of a pixel/block which is judged by the judging step as being coded by using inter-frame correlation without using motion compensation, at each pixel/block position of a frame over a prescribed counting period;
  
  selecting the caption region by comparing the frequency of appearance counted by the counting step with a prescribed threshold value;
  
  forming a two-dimensional counting matrix indicating the frequency of appearance at each pixel/block position as counted by the counting step; and
  
  producing a projection histogram by projecting the counting matrix into at least one direction defining the counting matrix;
  
  wherein the producing step obtains a first projection histogram by projecting the counting matrix into a first direction, determines a first action along the first direction in which the frequency of appearance as indicated by the first projection histogram is greater than a first prescribed threshold value, and obtains the projection histogram by projecting the first projection histogram into a second direction within the first section; and
  
  wherein the selecting step compares the frequency of appearance as indicated by the projection histogram with the prescribed threshold value, and determines a second section along the second direction in which the frequency of appearance as indicated by the projection histogram is greater than the prescribed threshold value, and selects those pixels/blocks which are within the first section and the second section as the caption region.

2. A method for detecting a caption region from video data coded by using a combination of predictive coding and motion compensation, comprising the steps of:
- judging whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not; and
  
  detecting a region in the video data at which pixels/blocks judged by the judging step as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, as a caption region;
  
  wherein the detecting step includes the steps of;
  
  counting a frequency of appearance of a pixel/block which is judged by the judging step as being coded by using inter-frame correlation without using motion compensation, at each pixel/block position of a frame over a prescribed counting period; and
  
  selecting the caption region by comparing the frequency of appearance counted by the counting step with a prescribed threshold value;
  
  wherein the counting step counts the frequency of appearance by incrementing the frequency of appearance by a value “
  
  1”
  
  for each appearance of a pixel/block which is judged by the judging step as being coded by using inter-frame correlation without using motion compensation, which decrementing the frequency of appearance by a value “
  
  −
  
  1”
  
  for each appearance of a pixel/block which is not judged by the judging step as being coded by using inter-frame correlation without using motion compensation.

3. A method for detecting a caption region from video data coded by using a combination of predictive coding and motion compensation, comprising the steps of:
- judging whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not;
  
  detecting a region in the video data at which pixels/blocks judged by the judging step as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, as a caption region;
  
  storing pixels/blocks of a plurality of caption regions detected by the detecting step at different timings into a three-dimensional buffer defined by two spatial axes and one time axis, as caption candidate pixels/blocks; and
  
  merging a plurality of caption candidate pixels/blocks for different timings as stored in the three-dimensional buffer;
  
  wherein the merging step applies a dilation processing to replace a pixel/block value of each caption candidate pixel/block by a maximum value of pixel/block values among neighboring caption candidate pixels/blocks, and an erosion processing to replace a pixel/block value of each caption candidate pixel/block by a minimum value of pixel/block values among neighboring caption candidate pixels/blocks.

4. A method for detecting a caption region from video data coded by using a combination of predictive coding and motion compensation, comprising the steps of:
- judging whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not;
  
  detecting a region in the video data at which pixels/blocks judged by the judging step as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, as a caption region;
  
  storing pixels/blocks of a plurality of caption regions detected by the detecting step at different timings into a three-dimensional buffer defined by two spatial axes and one time axis, as caption candidate pixels/blocks;
  
  merging a plurality of caption candidate pixels/blocks for different timings as stored in the three-dimensional buffer, and judging a frame immediately before or after a time section at which no caption candidate pixel/block exists as a representative frame of a caption which exists immediately before or after the time section.

5. A method for detecting a caption region from video data coded by using a combination of predictive coding and motion compensation, comprising the steps of:
- judging whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not;
  
  detecting a region in the video data at which pixels/blocks judged by the judging step as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, as a caption region;
  
  storing pixels/blocks of a plurality of caption regions detected by the detecting step at different timings into a three-dimensional buffer defined by two spatial axes and one time axis, as caption candidate pixels/blocks;
  
  merging a plurality of caption candidate pixels/blocks for different timings as stored in the three-dimensional buffer; and
  
  labeling each connected component of the caption candidate pixels/blocks as merged by the merging step distinctively; and
  
  judging a frame containing a caption candidate pixel/block of each connected component which is labeled distinctively by the labeling step as a representative frame of a caption formed by the caption candidate pixels/blocks of each connected component.

6. A method for detecting a caption region from video data coded by using a combination of predictive coding and motion compensation, comprising the steps of:
- judging whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not;
  
  detecting a region in the video data at which pixels/blocks judged by the judging step as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, as a caption region;
  
  producing a caption candidate image from one frame image of the video data by assigning a value “
  
  1”
  
  to each caption region detected by the detecting step while assigning a value “
  
  0”
  
  to any other regions;
  
  producing a difference image between said one frame image and another frame image of the video data;
  
  extracting difference image portions according to a value of the caption candidate image at each portion of the difference image; and
  
  judging an existence of a caption according to the caption candidate image and the difference image portions.
- View Dependent Claims (7, 8, 9, 10, 11, 12)
- - 7. The method of claim 6, wherein the extracting step extracts portions of the difference image at which the caption candidate image has a value “
    - 1”
      
      as the difference image portions.
  - 8. The method of claim 7, wherein the extracting step includes the steps of:
9. The method of claim 8, wherein the generating step includes the steps of:
- detecting each unchanged pixel/block for which a pixel/block value is unchanged between said one frame image and said another frame image;
  
  detecting each newly appeared caption candidate region in said one frame image; and
  
  producing the mask by assigning a value “
  
  1”
  
  to each region at which at least one of the unchanged pixel/block and the newly appeared caption candidate region exists, and a value “
  
  0”
  
  to any other region.
10. The method of claim 6, wherein the judging step includes the steps of:
- counting a first number of pixels/blocks at which the caption candidate image has a value “
  
  1”
  
  ;
  
  counting a second number of pixels/blocks at which the difference image portions have a pixel/block value greater than a prescribed threshold value;
  
  determining the existence of the caption according to the first number of pixels/blocks and the second number of pixels/blocks.
11. The method of claim 10, wherein the determining step determines that the caption exists when an area of a judging region at which the caption candidate image has a value “
- 1”
  
  is judged as sufficiently large and a change in said one frame image within the judging region is judged as sufficiently small, according to the first number of pixels/blocks and the second number of pixels/blocks.
12. The method of claim 10, wherein the determining step measures a period of time for which the caption continues to appear according to the first number of pixels/blocks and the second number of pixels/blocks, and determines that the caption exists when the measured period of time is longer than a prescribed period of time.

13. A method for detecting a caption region from video data coded by using a combination of predictive coding and motion compensation, comprising the steps of:
- judging whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not;
  
  detecting a region in the video data at which pixels/blocks judged by the judging step as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, as a caption region;
  
  entering an information on a spatial position range on an image field of a desired caption to be retrieved, as a retrieval key;
  
  selecting a part of the video data corresponding to the desired caption to be retrieved, by comparing each caption region detected by the detecting step and the retrieval key entered by the entering step;
  
  displaying said part of the video data selected by the selecting step; and
  
  recording a combination of the video data and an information on a spatial position on an image field of each caption region detected by the detecting step as an index information;
  
  wherein the selecting step selects said part of the video data by comparing the index information and the retrieval key.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The method of claim 13, wherein the entering step enters a figure drawn by using an input device which indicates a spatial position range on an image field of the desired caption to be retrieved as the retrieval key.
  - 15. The method of claim 13, wherein the selecting step compares a spatial position on an image field of each caption region detected by the detecting step with a spatial position range on an image field indicated by the retrieval key, and selects each part of the video data at which the spatial position is contained within the spatial position range or the spatial position overlaps with the spatial position range.
  - 16. The method of claim 13, wherein the displaying step displays each frame image of said part of the video data selected by the selecting step along with an indication of the desired caption to be retrieved within each frame image.
  - 17. The method of claim 13, wherein the displaying step displays the video data playbacked starting from a frame corresponding to said part of the video data selected by the selecting step.

18. An apparatus for detecting a caption region from video data coded by using a combination of predictive coding and motion compensation, comprising:
- a judgment unit for judging whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not;
  
  a detection unit for detecting a region in the video data at which pixels/blocks judged by the judgment unit as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, as a caption region;
  
  a caption candidate image production unit for producing a caption candidate image from one frame image of the video data by assigning a value “
  
  1”
  
  to each caption region detected by the detection unit while assigning a value “
  
  0”
  
  to any other regions;
  
  a difference image production unit for producing a difference image between said one frame image and another frame image of the video data;
  
  an extraction unit for extracting difference image portions according to a value of the caption candidate image at each portion of the difference image; and
  
  a judgment unit for judging an existence of a caption according to the caption candidate image and the difference image portions.

19. An article of manufacture, comprising:
- a computer usable medium having computer readable program code means embodied therein for causing a computer to function as a system for detecting a caption region from video data coded by using a combination of predictive coding and motion compensation, the computer readable program means including;
  
  first computer readable program code means for causing the computer to function as a judgement unit for judging whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not;
  
  second computer readable program code means for causing the computer to function as a detection unit for detecting a region in the video data at which pixels/blocks judged by the first computer readable program code means as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, as a caption region;
  
  third computer readable program code means for causing the computer to function as a caption candidate image production unit for producing a caption candidate image from one frame image of the video data by assigning a value “
  
  1”
  
  to each caption region detected by the second computer readable program code means while assigning a value “
  
  0”
  
  to any other regions;
  
  fourth computer readable program code means for causing the computer to function as a difference image production unit for producing a difference image between said one frame image and another frame image of the video data;
  
  fifth computer readable program code means for causing the computer to function as an extraction unit for extracting difference image portions according to a value of the caption candidate image at each portion of the difference image; and
  
  sixth computer readable program code means for causing the computer to function as a judgment unit for judging an existence of a caption according to the caption candidate image and the difference image portions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nippon Telegraph and Telephone Corporation
Original Assignee
Nippon Telegraph and Telephone Corporation
Inventors
Akutsu, Akihito, Hamada, Hiroshi, Satou, Takashi, Tonomura, Yoshinobu, Taniguchi, Yukinobu, Niikura, Yasuhiro
Primary Examiner(s)
Kelley, Chris
Assistant Examiner(s)
PHILIPPE, GIMS S

Application Number

US08/863,840
Time in Patent Office

1,470 Days
Field of Search

348/415, 348/416, 348/699, 348/700, 348/420, 348/465, 348/468, 348/430, 348/413, 348/407, 348/401.1, 348/408.1, 348/461, 348/564, 348/384.1, 382/209, 382/170, 382/171, 382/176, 382/177, 345/328, 345/443, 375/240.13
US Class Current

375/240.13
CPC Class Codes

G06F 16/7335   Graphical querying, e.g. qu...

G06F 16/7844   using original textual cont...

G06V 20/62   Text, e.g. of license plate...

G06V 30/10   Character recognition

H04N 19/27   involving both synthetic an...

H04N 19/61   in combination with predict...

Scheme for detecting captions in coded video data without decoding coded video data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

92 Citations

19 Claims

Specification

Use Cases

Quick Links

Others

Scheme for detecting captions in coded video data without decoding coded video data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

92 Citations

19 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others