Scheme for detecting captions in coded video data without decoding coded video data
First Claim
1. A method for detecting a caption region from video data coded by using a combination of predictive coding and motion compensation, comprising the steps of:
- judging whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not; and
detecting a region in the video data at which pixels/blocks judged by the judging step as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, as a caption region;
wherein the detecting step includes the steps of;
counting a frequency of appearance of a pixel/block which is judged by the judging step as being coded by using inter-frame correlation without using motion compensation, at each pixel/block position of a frame over a prescribed counting period;
selecting the caption region by comparing the frequency of appearance counted by the counting step with a prescribed threshold value;
forming a two-dimensional counting matrix indicating the frequency of appearance at each pixel/block position as counted by the counting step; and
producing a projection histogram by projecting the counting matrix into at least one direction defining the counting matrix;
wherein the producing step obtains a first projection histogram by projecting the counting matrix into a first direction, determines a first action along the first direction in which the frequency of appearance as indicated by the first projection histogram is greater than a first prescribed threshold value, and obtains the projection histogram by projecting the first projection histogram into a second direction within the first section; and
wherein the selecting step compares the frequency of appearance as indicated by the projection histogram with the prescribed threshold value, and determines a second section along the second direction in which the frequency of appearance as indicated by the projection histogram is greater than the prescribed threshold value, and selects those pixels/blocks which are within the first section and the second section as the caption region.
1 Assignment
0 Petitions
Accused Products
Abstract
A video caption detection scheme capable of detecting captions from the coded video data which are coded by using a combination of predictive coding and motion compensation, without requiring the decoding of coded video data into frame images. In this video caption detection scheme, whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not is judged. Then, a region in the video data at which pixels/blocks that is judged as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, is detected as a caption region. The detection can be realized by counting a frequency of appearance of a pixel/block which is judged as being coded by using inter-frame correlation without using motion compensation, at each pixel/block position of a frame over a prescribed counting period, and then comparing the counted frequency of appearance with a prescribed threshold value.
92 Citations
19 Claims
-
1. A method for detecting a caption region from video data coded by using a combination of predictive coding and motion compensation, comprising the steps of:
-
judging whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not; and
detecting a region in the video data at which pixels/blocks judged by the judging step as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, as a caption region;
wherein the detecting step includes the steps of;
counting a frequency of appearance of a pixel/block which is judged by the judging step as being coded by using inter-frame correlation without using motion compensation, at each pixel/block position of a frame over a prescribed counting period;
selecting the caption region by comparing the frequency of appearance counted by the counting step with a prescribed threshold value;
forming a two-dimensional counting matrix indicating the frequency of appearance at each pixel/block position as counted by the counting step; and
producing a projection histogram by projecting the counting matrix into at least one direction defining the counting matrix;
wherein the producing step obtains a first projection histogram by projecting the counting matrix into a first direction, determines a first action along the first direction in which the frequency of appearance as indicated by the first projection histogram is greater than a first prescribed threshold value, and obtains the projection histogram by projecting the first projection histogram into a second direction within the first section; and
wherein the selecting step compares the frequency of appearance as indicated by the projection histogram with the prescribed threshold value, and determines a second section along the second direction in which the frequency of appearance as indicated by the projection histogram is greater than the prescribed threshold value, and selects those pixels/blocks which are within the first section and the second section as the caption region.
-
-
2. A method for detecting a caption region from video data coded by using a combination of predictive coding and motion compensation, comprising the steps of:
-
judging whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not; and
detecting a region in the video data at which pixels/blocks judged by the judging step as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, as a caption region;
wherein the detecting step includes the steps of;
counting a frequency of appearance of a pixel/block which is judged by the judging step as being coded by using inter-frame correlation without using motion compensation, at each pixel/block position of a frame over a prescribed counting period; and
selecting the caption region by comparing the frequency of appearance counted by the counting step with a prescribed threshold value;
wherein the counting step counts the frequency of appearance by incrementing the frequency of appearance by a value “
1”
for each appearance of a pixel/block which is judged by the judging step as being coded by using inter-frame correlation without using motion compensation, which decrementing the frequency of appearance by a value “
−
1”
for each appearance of a pixel/block which is not judged by the judging step as being coded by using inter-frame correlation without using motion compensation.
-
-
3. A method for detecting a caption region from video data coded by using a combination of predictive coding and motion compensation, comprising the steps of:
-
judging whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not;
detecting a region in the video data at which pixels/blocks judged by the judging step as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, as a caption region;
storing pixels/blocks of a plurality of caption regions detected by the detecting step at different timings into a three-dimensional buffer defined by two spatial axes and one time axis, as caption candidate pixels/blocks; and
merging a plurality of caption candidate pixels/blocks for different timings as stored in the three-dimensional buffer;
wherein the merging step applies a dilation processing to replace a pixel/block value of each caption candidate pixel/block by a maximum value of pixel/block values among neighboring caption candidate pixels/blocks, and an erosion processing to replace a pixel/block value of each caption candidate pixel/block by a minimum value of pixel/block values among neighboring caption candidate pixels/blocks.
-
-
4. A method for detecting a caption region from video data coded by using a combination of predictive coding and motion compensation, comprising the steps of:
-
judging whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not;
detecting a region in the video data at which pixels/blocks judged by the judging step as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, as a caption region;
storing pixels/blocks of a plurality of caption regions detected by the detecting step at different timings into a three-dimensional buffer defined by two spatial axes and one time axis, as caption candidate pixels/blocks;
merging a plurality of caption candidate pixels/blocks for different timings as stored in the three-dimensional buffer, and judging a frame immediately before or after a time section at which no caption candidate pixel/block exists as a representative frame of a caption which exists immediately before or after the time section.
-
-
5. A method for detecting a caption region from video data coded by using a combination of predictive coding and motion compensation, comprising the steps of:
-
judging whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not;
detecting a region in the video data at which pixels/blocks judged by the judging step as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, as a caption region;
storing pixels/blocks of a plurality of caption regions detected by the detecting step at different timings into a three-dimensional buffer defined by two spatial axes and one time axis, as caption candidate pixels/blocks;
merging a plurality of caption candidate pixels/blocks for different timings as stored in the three-dimensional buffer; and
labeling each connected component of the caption candidate pixels/blocks as merged by the merging step distinctively; and
judging a frame containing a caption candidate pixel/block of each connected component which is labeled distinctively by the labeling step as a representative frame of a caption formed by the caption candidate pixels/blocks of each connected component.
-
-
6. A method for detecting a caption region from video data coded by using a combination of predictive coding and motion compensation, comprising the steps of:
-
judging whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not;
detecting a region in the video data at which pixels/blocks judged by the judging step as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, as a caption region;
producing a caption candidate image from one frame image of the video data by assigning a value “
1”
to each caption region detected by the detecting step while assigning a value “
0”
to any other regions;
producing a difference image between said one frame image and another frame image of the video data;
extracting difference image portions according to a value of the caption candidate image at each portion of the difference image; and
judging an existence of a caption according to the caption candidate image and the difference image portions. - View Dependent Claims (7, 8, 9, 10, 11, 12)
generating a mask which has a value “
1”
at each region for which the caption candidate image and the difference image portions are to be evaluated in order to judge the existence of the caption, and a value “
0”
at any other region, from the caption candidate image and the difference image; and
extracting portions of the difference image at which the mask has a value “
1”
as the difference image portions.
-
-
9. The method of claim 8, wherein the generating step includes the steps of:
-
detecting each unchanged pixel/block for which a pixel/block value is unchanged between said one frame image and said another frame image;
detecting each newly appeared caption candidate region in said one frame image; and
producing the mask by assigning a value “
1”
to each region at which at least one of the unchanged pixel/block and the newly appeared caption candidate region exists, and a value “
0”
to any other region.
-
-
10. The method of claim 6, wherein the judging step includes the steps of:
-
counting a first number of pixels/blocks at which the caption candidate image has a value “
1”
;
counting a second number of pixels/blocks at which the difference image portions have a pixel/block value greater than a prescribed threshold value;
determining the existence of the caption according to the first number of pixels/blocks and the second number of pixels/blocks.
-
-
11. The method of claim 10, wherein the determining step determines that the caption exists when an area of a judging region at which the caption candidate image has a value “
- 1”
is judged as sufficiently large and a change in said one frame image within the judging region is judged as sufficiently small, according to the first number of pixels/blocks and the second number of pixels/blocks.
- 1”
-
12. The method of claim 10, wherein the determining step measures a period of time for which the caption continues to appear according to the first number of pixels/blocks and the second number of pixels/blocks, and determines that the caption exists when the measured period of time is longer than a prescribed period of time.
-
13. A method for detecting a caption region from video data coded by using a combination of predictive coding and motion compensation, comprising the steps of:
-
judging whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not;
detecting a region in the video data at which pixels/blocks judged by the judging step as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, as a caption region;
entering an information on a spatial position range on an image field of a desired caption to be retrieved, as a retrieval key;
selecting a part of the video data corresponding to the desired caption to be retrieved, by comparing each caption region detected by the detecting step and the retrieval key entered by the entering step;
displaying said part of the video data selected by the selecting step; and
recording a combination of the video data and an information on a spatial position on an image field of each caption region detected by the detecting step as an index information;
wherein the selecting step selects said part of the video data by comparing the index information and the retrieval key. - View Dependent Claims (14, 15, 16, 17)
-
-
18. An apparatus for detecting a caption region from video data coded by using a combination of predictive coding and motion compensation, comprising:
-
a judgment unit for judging whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not;
a detection unit for detecting a region in the video data at which pixels/blocks judged by the judgment unit as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, as a caption region;
a caption candidate image production unit for producing a caption candidate image from one frame image of the video data by assigning a value “
1”
to each caption region detected by the detection unit while assigning a value “
0”
to any other regions;
a difference image production unit for producing a difference image between said one frame image and another frame image of the video data;
an extraction unit for extracting difference image portions according to a value of the caption candidate image at each portion of the difference image; and
a judgment unit for judging an existence of a caption according to the caption candidate image and the difference image portions.
-
-
19. An article of manufacture, comprising:
-
a computer usable medium having computer readable program code means embodied therein for causing a computer to function as a system for detecting a caption region from video data coded by using a combination of predictive coding and motion compensation, the computer readable program means including;
first computer readable program code means for causing the computer to function as a judgement unit for judging whether each pixel/block in the video data is coded by using inter-frame correlation without using motion compensation or not;
second computer readable program code means for causing the computer to function as a detection unit for detecting a region in the video data at which pixels/blocks judged by the first computer readable program code means as being coded by using inter-frame correlation without using motion compensation are concentrated time-wise and space-wise, as a caption region;
third computer readable program code means for causing the computer to function as a caption candidate image production unit for producing a caption candidate image from one frame image of the video data by assigning a value “
1”
to each caption region detected by the second computer readable program code means while assigning a value “
0”
to any other regions;
fourth computer readable program code means for causing the computer to function as a difference image production unit for producing a difference image between said one frame image and another frame image of the video data;
fifth computer readable program code means for causing the computer to function as an extraction unit for extracting difference image portions according to a value of the caption candidate image at each portion of the difference image; and
sixth computer readable program code means for causing the computer to function as a judgment unit for judging an existence of a caption according to the caption candidate image and the difference image portions.
-
Specification