Methods and architecture for indexing and editing compressed video over the world wide web
First Claim
1. A method for detecting moving video objects in a compressed digital bitstream which represents a sequence of fields or frames of video information for one or more previously captured scenes of video, comprising the steps of:
- a. analyzing said compressed bitstream to locate scene cuts therein, thereby determining at least one sequence of fields or frames of video information which represents a single video scene;
b. estimating one or more operating parameters for a camera which initially captured said video scene by analyzing a portion of said compressed bitstream which corresponds to said video scene; and
c. detecting one or more moving video objects represented in said compressed bitstream by applying global motion compensation with said estimated operating parameters.
3 Assignments
0 Petitions
Accused Products
Abstract
Techniques for detecting moving video objects in a compressed digital bitstream (111) and for tools for editing compressed video are disclosed. Video objects (117) are detected and indexed by analyzing a compressed bitstream to locate scene cuts (112), estimating operating parameters for a camera which initially viewed the video (114), and detecting one or more moving video objects represented in the compressed bitstream by applying global motion compensation which account for the estimated operating parameters. Tools are provided for permitting dissolve, masking, freeze frame, slow and variable speed playback, and strobe motion special effects to compressed video. The tools may be implemented in a system for editing (130) compressed video information over a distributed network.
241 Citations
15 Claims
-
1. A method for detecting moving video objects in a compressed digital bitstream which represents a sequence of fields or frames of video information for one or more previously captured scenes of video, comprising the steps of:
-
a. analyzing said compressed bitstream to locate scene cuts therein, thereby determining at least one sequence of fields or frames of video information which represents a single video scene;
b. estimating one or more operating parameters for a camera which initially captured said video scene by analyzing a portion of said compressed bitstream which corresponds to said video scene; and
c. detecting one or more moving video objects represented in said compressed bitstream by applying global motion compensation with said estimated operating parameters. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
a. parsing said compressed bitstream into blocks of video information and associated motion vector information for each field or frame of video information which comprises the determined sequence of fields or frames of video information representative of said single scene;
b. performing inverse motion compensation on each of said parsed blocks of video information to derive discrete cosign transform coefficients for each of said parsed blocks of video information;
c. counting said motion vector information associated with each of said parsed blocks of video information; and
d. Determining from said counted motion vector information and said discrete cosign transform coefficient information whether one of said scene cuts has occurred.
-
-
5. The method of claim 1, wherein said analyzing step comprises parsing said compressed bitstream into blocks of video information and associated motion vector information for each field or frame of video information which comprises the determined sequence of fields or frames of video information representative of said single scene, and wherein said estimating step comprises the step of estimating any zoom and any pan of said camera by determining a multi-parameter transform model applied to said parsed motion vector information.
-
6. The method of claim 5, wherein said estimating step comprises the steps of:
-
a. computing each parameter for a multi-parameter affine transform which represents a transformation from a current frame of video information to a previous frame of video; and
b. computing said multi-parameter affine transform to thereby determine global motion information representative of said zoom and pan of said camera.
-
-
7. The method of claim 6, wherein said detecting step comprises computing local object motion for said one or more moving video objects based on said global motion information and on one or more of said motion vectors which correspond to said one or more moving video objects.
-
8. The method of claim 7, further comprising the steps of:
-
a. determining whether said local object motion is greater than a predetermined threshold;
b. applying morphological operations to said determined local object motion values to eliminate any erroneously sensed moving objects; and
c. determining border points of said detected moving objects to thereby locate a bounding box for said detected moving object.
-
-
9. An apparatus for detecting moving video objects in a compressed digital bitstream which represents a sequence of fields or frames of video information for one or more previously captured scenes of video, comprising:
-
a. means for analyzing said compressed bitstream to locate scene cuts therein and to determine at least one sequence of fields or frames of video information which represents a single video scene;
b. means, coupled to said analyzing means, for estimating one or more operating parameters for a camera which initially viewed said video scene by analyzing a portion of said compressed bitstream which corresponds to said video scene; and
c. means, coupled to said estimating means, for detecting one or more moving video objects represented in said compressed bitstream by applying global motion compensation with said estimated operating parameters. - View Dependent Claims (10, 11, 12, 13, 14, 15)
a. parsing means for receiving and parsing said compressed bitstream into blocks of video information and associated motion vector information for each field or frame of video information which comprises the determined sequence of fields or frames of video information representative of said single scene;
b. means, coupled to said parsing means, for performing inverse motion compensation on each of said parsed blocks of video information to derive discrete cosign transform coefficients for each of said parsed blocks of video information;
c. counting means, coupled to said inverse motion compensation means, for counting said motion vector information associated with each of said parsed blocks of vide information; and
d. determining means, coupled to said counting mens, for determining from said counted motion vector information and said discrete cosign transform coefficient information whether one of said scene cuts has occurred.
-
-
12. The apparatus of claim 9, wherein said analyzing means further comprises means for parsing said compressed bitstream into blocks of video information and associated motion vector information for each field or frame of video information which comprises the determined sequence of fields or frames of video information representative of said single scene, and wherein said estimating means further comprises means for estimating any zoom and any pan of said camera by determining a multi-parameter transform model applied to said parsed motion vector information.
-
13. The apparatus of claim 12, wherein said estimating means further comprises:
-
a. means for computing each parameter for a multi-parameter affine transform which represents a transformation from a current frame of video information to a previous frame of video; and
b. means, coupled to said transform parameter computing means, for computing said multi-parameter affine transform to thereby determine global motion information representative of said zoom and pan of said camera.
-
-
14. The apparatus of claim 12, wherein said detecting means further comprises means for computing local object motion for said one or more moving video objects based on said global motion information and on one or more of said motion vectors which correspond to said one or more moving video objects.
-
15. The apparatus of claim 14, further comprising:
-
a. comparison means, coupled to said local object motion computing means, for determining whether said local object motion is greater than a predetermined threshold;
b. morphological operation means, coupled to said comparison means, for determined local object motion values to eliminate any erroneously sensed moving objects; and
c. border point determination means, coupled to said morphological operation means, for determining border points of said detected moving objects to thereby locate a bounding box for said detected moving object.
-
Specification