SYSTEM AND METHOD FOR SCALABLE VIDEO CODING USING TELESCOPIC MODE FLAGS
First Claim
Patent Images
1. A system for decoding of scalable digital video, the system comprising:
- an input configured to receive a scalable digital video bitstream comprising groups of coded units of a quality or spatial enhancement target layer and at least one additional layer, the digital video bitstream containing control data (including prediction control data) associated with a group of coded units and control (including prediction), texture, or motion data associated with individual coded units;
a decoder coupled to the input, wherein the decoder is configured to decode the control data associated with a group of coded units of the target layer and the at least one additional layer, and control, texture, or motion data associated with individual coded units of the target layer and the at least one additional layer;
a predictor coupled to the decoder, wherein the decoder is configured to generate prediction references for the control, texture, or motion data of a plurality of coded units of the target layer as signaled prediction control data associated with a group of coded units of the target layer or the at least one additional layer, or from prediction control data associated with individual coded units of the target layer or the at least one additional layer; and
a combiner coupled to the predictor, wherein the combiner is configured to combine the generated prediction references with the corresponding decoded control, texture, or motion data associated with the plurality of coded units of the target layer to produce portions of a decoded picture corresponding to the plurality of coded units of the target layer, wherein the prediction control data associated with the groups of coded units of the target layer or the at least one additional layer and the prediction control data associated with individual coded units of the target layer or the at least one additional layer include inter-layer prediction control data, and wherein the predictor is configured to use values indicated by the inter-layer prediction control data associated with a group of coded units of the target layer when the corresponding inter-layer prediction control data associated with individual coded units of the group of coded units of the target layer are not present in the digital video bitstream.
6 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for scalable video coding using special inter-layer prediction modes (called telescopic modes) are provided. These modes facilitate accelerated operation of encoders with improved coding efficiency.
36 Citations
21 Claims
-
1. A system for decoding of scalable digital video, the system comprising:
-
an input configured to receive a scalable digital video bitstream comprising groups of coded units of a quality or spatial enhancement target layer and at least one additional layer, the digital video bitstream containing control data (including prediction control data) associated with a group of coded units and control (including prediction), texture, or motion data associated with individual coded units;
a decoder coupled to the input, wherein the decoder is configured to decode the control data associated with a group of coded units of the target layer and the at least one additional layer, and control, texture, or motion data associated with individual coded units of the target layer and the at least one additional layer;
a predictor coupled to the decoder, wherein the decoder is configured to generate prediction references for the control, texture, or motion data of a plurality of coded units of the target layer as signaled prediction control data associated with a group of coded units of the target layer or the at least one additional layer, or from prediction control data associated with individual coded units of the target layer or the at least one additional layer; and
a combiner coupled to the predictor, wherein the combiner is configured to combine the generated prediction references with the corresponding decoded control, texture, or motion data associated with the plurality of coded units of the target layer to produce portions of a decoded picture corresponding to the plurality of coded units of the target layer, wherein the prediction control data associated with the groups of coded units of the target layer or the at least one additional layer and the prediction control data associated with individual coded units of the target layer or the at least one additional layer include inter-layer prediction control data, and wherein the predictor is configured to use values indicated by the inter-layer prediction control data associated with a group of coded units of the target layer when the corresponding inter-layer prediction control data associated with individual coded units of the group of coded units of the target layer are not present in the digital video bitstream. - View Dependent Claims (2, 3, 4, 9, 10, 20)
-
-
5. A system for scalable coding of digital video, the system comprising:
-
an input configured to receive digital video input pictures;
an optionally operated downsampler coupled to the input, wherein the down sampler is configured to generate a downsampled picture of an input picture at a lower resolution;
a first prediction estimator coupled to either the optional downsampler or the input and a first combiner that provides a plurality of previously decoded base layer pictures to be used as reference pictures, wherein the first prediction estimator is configured to generate a first set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the (optionally downsampled) picture, or control data (including prediction control data) associated with a group of coded units of the (optionally downsampled) picture, and to generate the portions of a first prediction reference picture that correspond to the coded units;
a first comparer coupled to the first prediction estimator and the optional downsampler or input, wherein the first comparer is configured to compute the difference between the (optionally downsampled) input picture and the portions of a first prediction reference picture, and generate a second set of control (including prediction) and texture data associated with a plurality of the coded units of the (optionally downsampled) input picture, and control (including prediction) data associated with a group of coded units of the (optionally downsampled) input picture, a first combiner coupled to the first comparer and the first prediction estimator wherein the first combiner is configured to combine the second set of generated control (including prediction) and texture data with their corresponding portions of the first prediction reference picture to generate the corresponding portions of a new base layer decoded picture;
a second prediction estimator coupled to the input and a second combiner that provides a plurality of previously decoded enhancement layer pictures to be used as reference pictures, wherein the second prediction estimator is configured to generate a third set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the input picture, or control data (including prediction control data) associated with a group of coded units of the input picture, and also generating the portions of a second prediction reference picture that correspond to the coded units;
a second comparer coupled to the second prediction estimator and the input, wherein the second comparer is configured to compute the difference between the input picture and the second prediction reference picture, and generate a fourth set of control (including prediction) and texture data associated with a plurality of the coded units of the input picture, as well as control (including prediction) data associated with a group of coded units of the input picture, a second combiner coupled to the second comparer and the second prediction estimator, wherein the second combiner is configured to combine the fourth set of generated control (including prediction) and texture data with their corresponding portions of the second prediction reference picture to generate the corresponding portions of a new enhancement layer decoded picture; and
an encoder configured to encode the first set of control (including prediction) and motion data and the second set of control (including prediction) and texture data to produce a base layer bit stream, the third set of control (including prediction) and motion data and the fourth set of control (including prediction) and texture data to produce an enhancement layer bit stream, and multiplex the data into a single output bit stream, wherein the third and fourth sets of control data include inter-layer prediction control data, and wherein the second prediction estimator and the second comparer are further configured to set inter-layer prediction control data values in one or more groups of coded units of the input picture such that corresponding inter-layer prediction control data values in the coded units of the input picture associated with the one or more groups of coded units of the input picture are not transmitted. - View Dependent Claims (6, 7, 8)
-
-
11. A method for decoding of scalable digital video, the method comprising:
-
at an input, receiving a scalable digital video bitstream comprising groups of coded units of a quality or spatial enhancement target layer and at least one additional layer, the digital video bitstream containing control data (including prediction control data) associated with a group of coded units and control (including prediction), texture, or motion data associated with individual coded units;
at a decoder, decoding the received input by decoding the control data associated with a group of coded units of the target layer and the at least one additional layer, and control, texture, or motion data associated with individual coded units of the target layer and the at least one additional layer;
using a predictor coupled to the decoder, to generate prediction references for the control, texture, or motion data of a plurality of coded units of the target layer from signaled prediction control data associated with a group of coded units of the target layer or the at least one additional layer, or from prediction control data associated with individual coded units of the target layer or the at least one additional layer; and
at a combiner coupled to the predictor, combining the generated prediction references with the corresponding decoded control, texture, or motion data associated with the plurality of coded units of the target layer to produce portions of a decoded picture corresponding to the plurality of coded units of the target layer, wherein the prediction control data associated with the groups of coded units of the target layer or the at least one additional layer and the prediction control data associated with individual coded units of the target layer or the at least one additional layer include inter-layer prediction control data, and wherein the predictor is configured to use values indicated by the inter-layer prediction control data associated with a group of coded units of the target layer when the corresponding inter-layer prediction control data associated with individual coded units of the group of coded units of the target layer are not present in the digital video bitstream. - View Dependent Claims (12, 13, 14, 19, 21)
-
-
15. A method for scalable coding of digital video, the method comprising:
-
at an input receiving digital video input pictures;
optionally operating a downsampler coupled to the input to generate a downsampled picture of an input picture at a lower resolution;
at a first prediction estimator coupled to either the optionally operated downsampler or the input and a first combiner that provides a plurality of previously decoded base layer pictures to be used as reference pictures, generating a first set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the (optionally downsampled) picture, or control data (including prediction control data) associated with a group of coded units of the (optionally downsampled) picture, and to generate the portions of a first prediction reference picture that correspond to the coded units;
at a first comparer coupled to the first prediction estimator and the optional downsampler or input, computing the difference between the (optionally downsampled) input picture and the portions of a first prediction reference picture, and generating a second set of control (including prediction) and texture data associated with a plurality of the coded units of the (optionally downsampled) input picture, and control (including prediction) data associated with a group of coded units of the (optionally downsampled) input picture, at a first combiner coupled to the first comparer and the first prediction estimator, combining the second set of generated control (including prediction) and texture data with their corresponding portions of the first prediction reference picture to generate the corresponding portions of a new base layer decoded picture;
at a second prediction estimator coupled to the input and a second combiner that provides a plurality of previously decoded enhancement layer pictures to be used as reference pictures, generating a third set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the input picture, or control data (including prediction control data) associated with a group of coded units of the input picture, and generating the portions of a second prediction reference picture that correspond to the coded units;
at a second comparer coupled to the second prediction estimator and the input, computing the difference between the input picture and the second prediction reference picture, and generating a fourth set of control (including prediction) and texture data associated with a plurality of the coded units of the input picture, and control (including prediction) data associated with a group of coded units of the input picture, at a second combiner coupled to the second comparer and the second prediction estimator, combining the fourth set of generated control (including prediction) and texture data with their corresponding portions of the second prediction reference picture to generate the corresponding portions of a new enhancement layer decoded picture; and
at an encoder, encoding the first set of control (including prediction) and motion data and the second set of control (including prediction) and texture data to produce a base layer bit stream, the third set of control (including prediction) and motion data and the fourth set of control (including prediction) and texture data to produce an enhancement layer bit stream, and multiplexing the data into a single output bit stream, wherein the third and fourth sets of control data include inter-layer prediction control data, and at the second prediction estimator and the second comparer setting inter-layer prediction control data values in one or more groups of coded units of the input picture such that corresponding inter-layer prediction control data values in the coded units of the input picture associated with the one or more groups of coded units of the input picture are not transmitted. - View Dependent Claims (16, 17, 18)
-
Specification