System and method for scalable video coding using telescopic mode flags
First Claim
Patent Images
1. A system for decoding of scalable digital video, the system comprising:
- an input configured to receive a scalable digital video bitstream comprising slices of a quality or spatial enhancement target layer and at least one additional layer in accordance with the SVC JD8 specification, the digital video bitstream containing control data (including prediction control data) associated with slices and control (including prediction), texture, or motion data associated with macroblocks or macroblock partitions;
a decoder coupled to the input, wherein the decoder is configured to decode the control data associated with a slice of the target layer and the at least one additional layer, and control, texture, or motion data associated with individual macroblock or macroblock partitions of the target layer and the at least one additional layer;
a predictor coupled to the decoder, wherein the decoder is configured to generate prediction references for the control, texture, or motion data of a plurality of macroblocks or macroblock partitions of the target layer as signaled prediction control data associated with a slice of the target layer or the at least one additional layer, or from prediction control data associated with individual macroblocks or macroblock partitions of the target layer or the at least one additional layer; and
a combiner coupled to the predictor, wherein the combiner is configured to combine the generated prediction references with the corresponding decoded control, texture, or motion data associated with the plurality of macroblocks or macroblock partitions of the target layer to produce portions of a decoded picture corresponding to the plurality of macroblocks or macroblock partitions of the target layer,wherein the prediction control data associated with the slices of the target layer or the at least one additional layer include an adaptive_residual_prediction_flag parameter and, if the adaptive_residual_prediction_flag parameter is not set, a default_residual_prediction_flag parameter and the prediction control data associated with individual macroblocks or macroblock partitions of the target layer or the at least one additional layer include a residual_prediction_flag parameter, and wherein the decoder is configured not to decode the residual_prediction_flag parameter in macroblock or macroblock partitions of a slice for which the adaptive_residual_prediction_flag parameter is set but decode instead the default_residual_prediction_flag of the slice, and wherein the predictor is further configured to use the value of default_residual_prediction_flag as the value for the residual_prediction_flag parameter for all macroblocks or macroblock partitions of the slice.
3 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for scalable video coding using special inter-layer prediction modes (called telescopic modes) are provided. These modes facilitate accelerated operation of encoders with improved coding efficiency.
15 Citations
8 Claims
-
1. A system for decoding of scalable digital video, the system comprising:
-
an input configured to receive a scalable digital video bitstream comprising slices of a quality or spatial enhancement target layer and at least one additional layer in accordance with the SVC JD8 specification, the digital video bitstream containing control data (including prediction control data) associated with slices and control (including prediction), texture, or motion data associated with macroblocks or macroblock partitions; a decoder coupled to the input, wherein the decoder is configured to decode the control data associated with a slice of the target layer and the at least one additional layer, and control, texture, or motion data associated with individual macroblock or macroblock partitions of the target layer and the at least one additional layer; a predictor coupled to the decoder, wherein the decoder is configured to generate prediction references for the control, texture, or motion data of a plurality of macroblocks or macroblock partitions of the target layer as signaled prediction control data associated with a slice of the target layer or the at least one additional layer, or from prediction control data associated with individual macroblocks or macroblock partitions of the target layer or the at least one additional layer; and a combiner coupled to the predictor, wherein the combiner is configured to combine the generated prediction references with the corresponding decoded control, texture, or motion data associated with the plurality of macroblocks or macroblock partitions of the target layer to produce portions of a decoded picture corresponding to the plurality of macroblocks or macroblock partitions of the target layer, wherein the prediction control data associated with the slices of the target layer or the at least one additional layer include an adaptive_residual_prediction_flag parameter and, if the adaptive_residual_prediction_flag parameter is not set, a default_residual_prediction_flag parameter and the prediction control data associated with individual macroblocks or macroblock partitions of the target layer or the at least one additional layer include a residual_prediction_flag parameter, and wherein the decoder is configured not to decode the residual_prediction_flag parameter in macroblock or macroblock partitions of a slice for which the adaptive_residual_prediction_flag parameter is set but decode instead the default_residual_prediction_flag of the slice, and wherein the predictor is further configured to use the value of default_residual_prediction_flag as the value for the residual_prediction_flag parameter for all macroblocks or macroblock partitions of the slice.
-
-
2. A system for scalable coding of digital video, the system comprising:
-
an input configured to receive digital video input pictures; an optionally operated downsampler coupled to the input, wherein the down sampler is configured to generate a downsampled picture of an input picture at a lower resolution; a first prediction estimator coupled to either the optional downsampler or the input and a first combiner that provides a plurality of previously decoded base layer pictures to be used as reference pictures, wherein the first prediction estimator is configured to generate a first set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the (optionally downsampled) picture, or control data (including prediction control data) associated with a group of coded units of the (optionally downsampled) picture, and to generate the portions of a first prediction reference picture that correspond to the coded units; a first comparer coupled to the first prediction estimator and the optional downsampler or input, wherein the first comparer is configured to compute the difference between the (optionally downsampled) input picture and the portions of a first prediction reference picture, and generate a second set of control (including prediction) and texture data associated with a plurality of the coded units of the (optionally downsampled) input picture, and control (including prediction) data associated with a group of coded units of the (optionally downsampled) input picture, a first combiner coupled to the first comparer and the first prediction estimator wherein the first combiner is configured to combine the second set of generated control (including prediction) and texture data with their corresponding portions of the first prediction reference picture to generate the corresponding portions of a new base layer decoded picture; a second prediction estimator coupled to the input and a second combiner that provides a plurality of previously decoded enhancement layer pictures to be used as reference pictures, wherein the second prediction estimator is configured to generate a third set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the input picture, or control data (including prediction control data) associated with a group of coded units of the input picture, and also generating the portions of a second prediction reference picture that correspond to the coded units; a second comparer coupled to the second prediction estimator and the input, wherein the second comparer is configured to compute the difference between the input picture and the second prediction reference picture, and generate a fourth set of control (including prediction) and texture data associated with a plurality of the coded units of the input picture, as well as control (including prediction) data associated with a group of coded units of the input picture, a second combiner coupled to the second comparer and the second prediction estimator, wherein the second combiner is configured to combine the fourth set of generated control (including prediction) and texture data with their corresponding portions of the second prediction reference picture to generate the corresponding portions of a new enhancement layer decoded picture; and an encoder configured to encode the first set of control (including prediction) and motion data and the second set of control (including prediction) and texture data to produce a base layer bit stream, the third set of control (including prediction) and motion data and the fourth set of control (including prediction) and texture data to produce an enhancement layer bit stream, and multiplex the data into a single output bit stream, wherein the third and fourth sets of control data include inter-layer prediction control data, and wherein the second prediction estimator and the second comparer are further configured to set inter-layer prediction control data values in one or more groups of coded units of the input picture such that corresponding inter-layer prediction control data values in the coded units of the input picture associated with the one or more groups of coded units of the input picture are not transmitted but instead a single value is transmited for the entire group of coded units and is used by the decoder. - View Dependent Claims (3)
wherein the second prediction estimator and the second comparer are configured to set the adaptive_prediction_fiag parameter to false in one or more slices, the encoder is further configured to not include the residual_prediction_flag in its encoding of the macroblocks or macroblock partitions associated with the one or more slices, and the second predictor and second comparer are further configured to use the default_residual_prediction_flag parameter value for the residual_prediction_flag parameter for all macroblocks or macroblock partitions of the one or more slices.
-
-
4. A method for decoding of scalable digital video, the method comprising:
-
at an input, receiving a scalable digital video bitstream comprising slices of a quality or spatial enhancement target layer and at least one additional layer in accordance with the SVC JD8 specification, the digital video bitstream containing control data (including prediction control data) associated with slices and control (including prediction), texture, or motion data associated with macroblocks or macroblock partitions; at a decoder, decoding the received input by decoding the control data associated with a slice of the target layer and the at least one additional layer, and control, texture, or motion data associated with individual macroblocks or macroblocks partitions of the target layer and the at least one additional layer; using a predictor coupled to the decoder, to generate prediction references for the control, texture, or motion data of a plurality of macroblocks or macroblock partitions of the target layer from signaled prediction control data associated with a slice of the target layer or the at least one additional layer, or from prediction control data associated with individual macroblocks or macroblock partitions of the target layer or the at least one additional layer; and at a combiner coupled to the predictor, combining using a processor the generated prediction references with the corresponding decoded control, texture, or motion data associated with the plurality of macroblocks or macroblock partitions of the target layer to produce portions of a decoded picture corresponding to the plurality of macroblocks or macroblock partitions of the target layer, wherein the prediction control data associated with the slices of the target layer or the at least one additional layer include an adaptive_residual_prediction_flag parameter and, if the adaptive_residual_prediction_flag parameter is not set, a default_residual_prediction_flag paramter and the prediction control data associated with individual macroblocks or macroblock partitions of the target layer or the at least one additional layer include a residual_prediction_flag paramter, and wherein the decoder is configured not to decode the residual_prediction_flag parameter in macroblock or macroblock partitions of a slice for which the adaptive_residual_prediction_flag parameter is set but decode instead the default_residual_prediction_flag of the slice, and wherein the predictor is further configured to use the value of default_residual_prediction_flag as the value for the residual_prediction_flag parameter for all macroblocks or macroblock partitions of the slice.
-
-
5. A method for scalable coding of digital video, the method comprising:
-
at an input receiving digital video input pictures; optionally operating a downsampler coupled to the input to generate a downsampled picture of an input picture at a lower resolution; at a first prediction estimator coupled to either the optionally operated downsampler or the input and a first combiner that provides a plurality of previously decoded base layer pictures to be used as reference pictures, generating a first set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the (optionally downsampled) picture, or control data (including prediction control data) associated with a group of coded units of the (optionally downsampled) picture, and to generate the portions of a first prediction reference picture that correspond to the coded units; at a first comparer coupled to the first prediction estimator and the optional downsampler or input, computing the difference between the (optionally downsampled) input picture and the portions of a first prediction reference picture, and generating a second set of control (including prediction) and texture data associated with a plurality of the coded units of the (optionally downsampled) input picture, and control (including prediction) data associated with a group of coded units of the (optionally downsampled) input picture, at a first combiner coupled to the first comparer and the first prediction estimator, combining using a processor the second set of generated control (including prediction) and texture data with their corresponding portions of the first prediction reference picture to generate the corresponding portions of a new base layer decoded picture; at a second prediction estimator coupled to the input and a second combiner that provides a plurality of previously decoded enhancement layer pictures to be used as reference pictures, generating a third set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the input picture, or control data (including prediction control data) associated with a group of coded units of the input picture, and generating the portions of a second prediction reference picture that correspond to the coded units; at a second comparer coupled to the second prediction estimator and the input, computing the difference between the input picture and the second prediction reference picture, and generating a fourth set of control (including prediction) and texture data associated with a plurality of the coded units of the input picture, and control (including prediction) data associated with a group of coded units of the input picture, at a second combiner coupled to the second comparer and the second prediction estimator, combining the fourth set of generated control (including prediction) and texture data with their corresponding portions of the second prediction reference picture to generate the corresponding portions of a new enhancement layer decoded picture; and at an encoder, encoding the first set of control (including prediction) and motion data and the second set of control (including prediction) and texture data to produce a base layer bit stream, the third set of control (including prediction) and motion data and the fourth set of control (including prediction) and texture data to produce an enhancement layer bit stream, and multiplexing the data into a single output bit stream, wherein the third and fourth sets of control data include inter-layer prediction control data, and at the second prediction estimator and the second comparer setting inter-layer prediction control data values in one or more groups of coded units of the input picture such that corresponding inter-layer prediction control data values in the coded units of the input picture associated with the one or more groups of coded units of the input picture are not transmitted but instead a single value is transmitted for the entire group of coded units and is used by the decoder. - View Dependent Claims (6)
the method further comprising; at the second prediction estimator and the second comparer, setting the adaptive_prediction_flag parameter to false in one or more slices; at the encoder, omitting the residual_prediction_flag in its encoding of the macroblocks or macroblock partitions associated with the one or more slices, and at the second predictor and second comparer using the default_residual_prediction_flag parameter value for the residual_prediction_flag parameter for all macroblocks or macroblock partitions of the one or more slices.
-
-
7. A non-transitory computer readable medium for scalable coding of digital video, the computer-readable medium encoded with a computer program comprising a set of instructions operable to direct a processing system to:
-
at an input receive digital video input pictures; optionally operate a downsampler coupled to the input to generate a downsampled picture of an input picture at a lower resolution; at a first prediction estimator coupled to either the optionally operated downsampler or the input and a first combiner that provides a plurality of previously decoded base layer pictures to be used as reference pictures, generate a first set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the (optionally downsampled) picture, or control data (including prediction control data) associated with a group of coded units of the (optionally downsampled) picture, and to generate the portions of a first prediction reference picture that correspond to the coded units; at a first comparer coupled to the first prediction estimator and the optional downsampler or input, compute the difference between the (optionally downsampled) input picture and the portions of a first prediction reference picture, and generate a second set of control (including prediction) and texture data associated with a plurality of the coded units of the (optionally downsampled) input picture, and control (including prediction) data associated with a group of coded units of the (optionally downsampled) input picture, at a first combiner coupled to the first comparer and the first prediction estimator, combine using a processor the second set of generated control (including prediction) and texture data with their corresponding portions of the first prediction reference picture to generate the corresponding portions of a new base layer decoded picture; at a second prediction estimator coupled to the input and a second combiner that provides a plurality of previously decoded enhancement layer pictures to be used as reference pictures, generate a third set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the input picture, or control data (including prediction control data) associated with a group of coded units of the input picture, and generate the portions of a second prediction reference picture that correspond to the coded units; at a second comparer coupled to the second prediction estimator and the input, compute the difference between the input picture and the second prediction reference picture, and generate a fourth set of control (including prediction) and texture data associated with a plurality of the coded units of the input picture, and control (including prediction) data associated with a group of coded units of the input picture, at a second combiner coupled to the second comparer and the second prediction estimator, combine the fourth set of generated control (including prediction) and texture data with their corresponding portions of the second prediction reference picture to generate the corresponding portions of a new enhancement layer decoded picture; and at an encoder, encode the first set of control (including prediction) and motion data and the second set of control (including prediction) and texture data to produce a base layer bit stream, the third set of control (including prediction) and motion data and the fourth set of control (including prediction) and texture data to produce an enhancement layer bit stream, and multiplex the data into a single output bit stream, wherein the third and fourth sets of control data include inter-layer prediction control data, and at the second prediction estimator and the second comparer setting inter-layer prediction control data values in one or more groups of coded units of the input picture such that corresponding inter-layer prediction control data values in the coded units of the input picture associated with the one or more groups of coded units of the input picture are not transmitted but instead a single value is transmitted for the entire group of coded units and is used by the decoder. - View Dependent Claims (8)
wherein the set of instructions is further operable to direct the processing system to; at the second prediction estimator and the second comparer, set the adaptive_prediction_flag parameter to false in one or more slices; at the encoder, omit the residual_prediction_flag in its encoding of the macroblocks or macroblock partitions associated with the one or more slices, and at the second predictor and second comparer use the default_residual_prediction_flag parameter value for the residual_prediction_flag parameter for all macroblocks or macroblock partitions of the one or more slices.
-
Specification