Object-based video compression process employing arbitrarily-shaped features
First Claim
1. A method for encoding a sequence of video image frames, each frame including at least one arbitrarily shaped video object, the method comprising:
- encoding video objects in each frame separately, where at least one of the objects is segmented from the frames in the video sequence and includes a mask for each of the frames defining the shape of the object in each frame, a composite bitmap formed from a combination of pixels of the object in the frames such that the composite bitmap includes portions of the object that are not visible in some of the frames, and trajectories for each frame describing a motion transform of the object for each frame used to transform the composite bitmap to a position in corresponding frames of the video sequence;
computing error signals for the object, including;
a) dividing the object into blocks of pixel locations, where at least some of the blocks overlap a boundary of the object;
b) for each block, computing motion parameters that estimate the motion between a current frame in the sequence and a previously reconstructed object from a previous frame, where the motion parameters are computed separately from the trajectories,c) computing a predicted object for the current frame by applying the motion parameters for each block to the previously reconstructed object;
d) transforming the mask associated with the object for the previous frame to the current frame using the trajectories associated with the current frame;
e) intersecting the transformed mask with the mask for the current frame to identify at least a first portion of the current mask that is outside the transformed mask, the pixels in the first portion being represented by the composite bitmap;
f) computing a difference between an original object for the current frame and the predicted object to compute error signals for the object;
g) compressing the error signals for the object for the current frame; and
h) repeating steps a-g to compute error signals associated with the object for frames in the video sequence;
wherein a compressed version of the object for the video sequence includes a single composite bitmap for the sequence, trajectories for the frames in the sequence, error signals for the frames in the sequence, and motion parameters for each block of the object for the frames in the sequence.
2 Assignments
0 Petitions
Accused Products
Abstract
Video encoding and decoding processes provide compression and decompression of digitized video signals representing display motion in video sequences of multiple image frames. The encoder process utilizes object- or feature-based video compression to improve the accuracy and versatility of encoding interframe motion and intraframe image features. Video information is compressed relative to objects or features of arbitrary configurations, rather than fixed, regular arrays of pixels as in conventional video compression methods. This reduces the error components and thereby improves the compression efficiency and accuracy. The decoder process decompresses the encoded video information to reconstruct the objects or features of arbitrary configurations.
-
Citations
8 Claims
-
1. A method for encoding a sequence of video image frames, each frame including at least one arbitrarily shaped video object, the method comprising:
-
encoding video objects in each frame separately, where at least one of the objects is segmented from the frames in the video sequence and includes a mask for each of the frames defining the shape of the object in each frame, a composite bitmap formed from a combination of pixels of the object in the frames such that the composite bitmap includes portions of the object that are not visible in some of the frames, and trajectories for each frame describing a motion transform of the object for each frame used to transform the composite bitmap to a position in corresponding frames of the video sequence; computing error signals for the object, including; a) dividing the object into blocks of pixel locations, where at least some of the blocks overlap a boundary of the object; b) for each block, computing motion parameters that estimate the motion between a current frame in the sequence and a previously reconstructed object from a previous frame, where the motion parameters are computed separately from the trajectories, c) computing a predicted object for the current frame by applying the motion parameters for each block to the previously reconstructed object; d) transforming the mask associated with the object for the previous frame to the current frame using the trajectories associated with the current frame; e) intersecting the transformed mask with the mask for the current frame to identify at least a first portion of the current mask that is outside the transformed mask, the pixels in the first portion being represented by the composite bitmap; f) computing a difference between an original object for the current frame and the predicted object to compute error signals for the object; g) compressing the error signals for the object for the current frame; and h) repeating steps a-g to compute error signals associated with the object for frames in the video sequence; wherein a compressed version of the object for the video sequence includes a single composite bitmap for the sequence, trajectories for the frames in the sequence, error signals for the frames in the sequence, and motion parameters for each block of the object for the frames in the sequence. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for decoding a sequence of video image frames, each frame including at least one arbitrarily shaped video object, the method comprising:
-
decoding video objects in each frame separately, where at least one of the objects is segmented from each of the frames in the video sequence and includes a mask for each of the frames defining the shape of the object in each frame, a composite bitmap formed from a combination of pixels of the object in the frames such that the composite bit map includes portions of the object that are not visible in some of the frames, and trajectories for each frame describing a motion transform of the object for each frame used to transform the composite bitmap to a position in corresponding frames of the video sequence; decoding error signals for the object for a current frame, including; a) for each block, decoding motion parameters that estimate the motion between a current frame in the sequence and a previously reconstructed object from a previous frame, where the motion parameters are computed separately from the trajectories, b) computing a predicted object for the current frame by applying the motion parameters for each block to the previously reconstructed object; c) transforming the mask associated with the object for the previous frame to the current frame using the trajectories associated with the current frame; e) intersecting the transformed mask with the mask for the current frame to identify at least a first portion of the current mask that is outside the transformed mask, the pixels in the first portion being represented by the composite bitmap; f) decompressing the error signals for the object for the current frame; g) adding the decompressed error signals for the object for the current frame to the predicted object to compute a reconstructed object for the current frame; and h) repeating steps a-g to reconstruct the object for frames in the video sequence wherein a compressed version of the object for the video sequence includes a single composite bitmap for the sequence, trajectories for the frames in the sequence, error signals for the frames in the sequence, and motion parameters for each block of the object for the frames in the sequence. - View Dependent Claims (7)
-
-
8. A computer readable medium having a data structure representing a compressed sequence of video frames comprising:
-
separately encoded video objects, where at least one of the objects is segmented from each of the frames in the video sequence and includes a mask for each of the frames defining the shape of the object in each frame, a composite bitmap formed from a combination of pixels of the object in each frame such that the composite bitmap includes portions of the object that are not visible in some of the frames, and trajectories for each frame describing a motion transform of the object for each frame used to transform the composite bitmap to a position in corresponding frames of the video sequence; encoded error signals for the object in each of the frames, where the error signals are arranged in an array of blocks of pixel locations that overlap the object in the corresponding frame, the encoded error signals including; for each block, motion parameters that estimate the motion between a current frame in the sequence and a previously reconstructed object from a previous frame, where the motion parameters are computed separately from the trajectories; for each block, error signals determined by; computing a predicted object for a frame by applying the motion parameters for each block to the previously reconstructed object; computing a difference between an original object for the current frame and the predicted object to compute error signals for the object; compressing the error signals for each block by using a lossy, transform coding method; wherein a compressed version of the object for the video sequence includes a single composite bitmap for the sequence, trajectories for each of the frames in the sequence, masks for each of the frames, compressed blocks of error signals for the frames in the sequence, and motion parameters for each block of the object for the frames in the sequence; and wherein the masks and corresponding trajectories are used to indicate which portion of the object is to be reconstructed from the composite bitmap for a selected frame by transforming a mask of a previously reconstructed frame and intersecting the transformed mask with a mask for the selected frame to identify whether a portion of the mask for the selected frame is outside the transformed mask, the pixels in the portion outside the transformed mask being represented by the composite bitmap.
-
Specification