System and method for scalable video coding using telescopic mode flags
First Claim
Patent Images
1. A system for decoding of scalable digital video, the system comprising:
- an input configured to receive a scalable digital video bitstream comprising groups of coded units of a quality or spatial enhancement target layer and at least one additional layer, the digital video bitstream containing control data (including prediction control data) associated with a group of coded units and control (including prediction), texture, or motion data associated with individual coded units;
a decoder coupled to the input, wherein the decoder is configured to decode the control data associated with a group of coded units of the target layer and the at least one additional layer, and control, texture, or motion data associated with individual coded units of the target layer and the at least one additional layer;
a predictor coupled to the decoder, wherein the decoder is configured to generate prediction references for the control, texture, or motion data of a plurality of coded units of the target layer as signaled prediction control data associated with a group of coded units of the target layer or the at least one additional layer, or from prediction control data associated with individual coded units of the target layer or the at least one additional layer; and
a combiner coupled to the predictor, wherein the combiner is configured to combine the generated prediction references with the corresponding decoded control, texture, or motion data associated with the plurality of coded units of the target layer to produce portions of a decoded picture corresponding to the plurality of coded units of the target layer,wherein the prediction control data associated with the groups of coded units of the target layer or the at least one additional layer and the prediction control data associated with individual coded units of the target layer or the at least one additional layer include inter-layer prediction control data, and wherein the predictor is configured to use values indicated by the inter-layer prediction control data associated with a group of coded units of the target layer when the corresponding inter-layer prediction control data associated with individual coded units of the group of coded units of the target layer are not present in the digital video bitstream.
6 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for scalable video coding using special inter-layer prediction modes (called telescopic modes) are provided. These modes facilitate accelerated operation of encoders with improved coding efficiency.
72 Citations
21 Claims
-
1. A system for decoding of scalable digital video, the system comprising:
-
an input configured to receive a scalable digital video bitstream comprising groups of coded units of a quality or spatial enhancement target layer and at least one additional layer, the digital video bitstream containing control data (including prediction control data) associated with a group of coded units and control (including prediction), texture, or motion data associated with individual coded units; a decoder coupled to the input, wherein the decoder is configured to decode the control data associated with a group of coded units of the target layer and the at least one additional layer, and control, texture, or motion data associated with individual coded units of the target layer and the at least one additional layer; a predictor coupled to the decoder, wherein the decoder is configured to generate prediction references for the control, texture, or motion data of a plurality of coded units of the target layer as signaled prediction control data associated with a group of coded units of the target layer or the at least one additional layer, or from prediction control data associated with individual coded units of the target layer or the at least one additional layer; and a combiner coupled to the predictor, wherein the combiner is configured to combine the generated prediction references with the corresponding decoded control, texture, or motion data associated with the plurality of coded units of the target layer to produce portions of a decoded picture corresponding to the plurality of coded units of the target layer, wherein the prediction control data associated with the groups of coded units of the target layer or the at least one additional layer and the prediction control data associated with individual coded units of the target layer or the at least one additional layer include inter-layer prediction control data, and wherein the predictor is configured to use values indicated by the inter-layer prediction control data associated with a group of coded units of the target layer when the corresponding inter-layer prediction control data associated with individual coded units of the group of coded units of the target layer are not present in the digital video bitstream. - View Dependent Claims (2, 3, 4)
the inter-layer prediction control data associated with a macroblock or macroblock partition comprise a residual_prediction_flag parameter; and the inter-layer prediction control data associated with a slice comprise an adaptive_residual_prediction_flag parameter, wherein the decoder is further configured not to decode the residual_prediction_flag parameter in macroblock or macroblock partitions of a slice for which the adaptive_residual_prediction_flag parameter is not set, and wherein the predictor is further configured to assume a default value for the residual_prediction_flag parameter for all macroblocks or macroblock partitions of the slice.
-
-
3. The system of claim 1, wherein
a received digital video bitstream conforms to the SVC JD8 specification, wherein coded units correspond to macroblocks or macroblock partitions and groups of coded units correspond to slices, extended such that: -
the inter-layer prediction control data associated with a macroblock or macroblock partition comprise a base_mode_flag parameter; the inter-layer prediction control data associated with a slice comprise an adaptive_prediction_flag parameter and a default_base_mode_flag parameter, wherein the decoder is further configured not to decode the base_mode_flag parameter in macroblock or macrobock partitions of a slice for which the adaptive_prediction_flag parameter is not set, and wherein the predictor is further configured to assume a value indicated by the default_base_mode_flag parameter for the base_mode_flag parameter for all macroblocks or macroblock partitions of the slice.
-
-
4. The system of claim 1, wherein
a received digital video bitstream conforms to the SVC JD8 specification, wherein coded units correspond to macroblocks or macroblock partitions and groups of coded units correspond to slices, extended such that: -
the inter-layer prediction control data associated with a macroblock or macroblock partition comprise a motion_prediction_flag_l0 parameter and a motion_prediction_flag_l1 parameter; and the inter-layer prediction control data associated with a slice comprise an adaptive_motion_prediction_flag parameter and a default_motion_prediction_flag parameter, wherein the decoding means is further configured not to decode the motion_prediction_flag_l0 or motion_prediction_flag_l1 parameters in macroblock or macrobock partitions of a slice for which the adaptive_motion_prediction_flag parameter is not set, and wherein the predictor is further configured to assume a value indicated by the default_motion_prediction_flag parameter for the motion_prediction_flag_l0 and motion_prediction_flag_l1 parameters for all macroblocks or macroblock partitions of the slice.
-
-
5. A system for scalable coding of digital video, the system comprising:
-
an input configured to receive digital video input pictures; an optionally operated downsampler coupled to the input, wherein the down sampler is configured to generate a downsampled picture of an input picture at a lower resolution; a first prediction estimator coupled to either the optional downsampler or the input and a first combiner that provides a plurality of previously decoded base layer pictures to be used as reference pictures, wherein the first prediction estimator is configured to generate a first set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the (optionally downsampled) picture, or control data (including prediction control data) associated with a group of coded units of the (optionally downsampled) picture, and to generate the portions of a first prediction reference picture that correspond to the coded units; a first comparer coupled to the first prediction estimator and the optional downsampler or input, wherein the first comparer is configured to compute the difference between the (optionally downsampled) input picture and the portions of a first prediction reference picture, and generate a second set of control (including prediction) and texture data associated with a plurality of the coded units of the (optionally downsampled) input picture, and control (including prediction) data associated with a group of coded units of the (optionally downsampled) input picture, a first combiner coupled to the first comparer and the first prediction estimator wherein the first combiner is configured to combine the second set of generated control (including prediction) and texture data with their corresponding portions of the first prediction reference picture to generate the corresponding portions of a new base layer decoded picture; a second prediction estimator coupled to the input and a second combiner that provides a plurality of previously decoded enhancement layer pictures to be used as reference pictures, wherein the second prediction estimator is configured to generate a third set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the input picture, or control data (including prediction control data) associated with a group of coded units of the input picture, and also generating the portions of a second prediction reference picture that correspond to the coded units; a second comparer coupled to the second prediction estimator and the input, wherein the second comparer is configured to compute the difference between the input picture and the second prediction reference picture, and generate a fourth set of control (including prediction) and texture data associated with a plurality of the coded units of the input picture, as well as control (including prediction) data associated with a group of coded units of the input picture, a second combiner coupled to the second comparer and the second prediction estimator, wherein the second combiner is configured to combine the fourth set of generated control (including prediction) and texture data with their corresponding portions of the second prediction reference picture to generate the corresponding portions of a new enhancement layer decoded picture; and an encoder configured to encode the first set of control (including prediction) and motion data and the second set of control (including prediction) and texture data to produce a base layer bit stream, the third set of control (including prediction) and motion data and the fourth set of control (including prediction) and texture data to produce an enhancement layer bit stream, and multiplex the data into a single output bit stream, wherein the third and fourth sets of control data include inter-layer prediction control data, and wherein the second prediction estimator and the second comparer are further configured to set inter-layer prediction control data values in one or more groups of coded units of the input picture such that corresponding inter-layer prediction control data values in the coded units of the input picture associated with the one or more groups of coded units of the input picture are not transmitted. - View Dependent Claims (6, 7, 8)
wherein the second prediction estimator and the second comparer are configured to set the adaptive_prediction_flag parameter to false in one or more slices, the encoder is further configured to not include the residual_prediction_flag in its encoding of the macroblocks or macroblock partitions associated with the one or more slices, and the second predictor and second comparer are further configured to assume a default value for the residual_prediction_flag parameter for all macroblocks or macroblock partitions of the one or more slices.
-
-
7. The system of claim 5, wherein the first and second prediction estimator, the first and second comparer, and the encoder are configured to produce and output bit stream conforming to the SVC JD 8 specification, wherein coded units correspond to macroblocks or macroblock partitions an groups of coded units correspond to slices, extended such that:
-
the inter-layer prediction control data associated with a macroblock or macroblock partition comprise a base_mode_flag parameter; the inter-layer prediction control data associated with a slice comprise an adaptive_prediction_flag parameter and a default_base_mode_flag parameter, wherein the second prediction estimator and the second comparer are configured to set the adaptive_prediction_flag parameter to false in one or more slices, the encoder is further configured to not include the base_mode_flag in its encoding of the macroblocks or macroblock partitions associated with the one or more slices, and where the second predictor and the second comparer are further configured to assume a value indicated by the default_base_mode_flag for the base_mode_flag parameter of all macroblocks or macroblock partitions of the one or more slices.
-
-
8. The system of claim 5, wherein the first and second prediction estimator, the first and second comparers, and the encoder are configured to produce and output bit stream conforming to the SVC JD 8 specification, wherein coded units correspond to macroblocks or macroblock partitions and groups of coded units correspond to slices, extended such that:
-
the inter-layer prediction control data associated with a macroblock or macroblock partition comprise a motion_prediction_flag_l0 parameter and a motion_prediction_flag_l1 parameter; the inter-layer prediction control data associated with a slice comprise an adaptive_motion_prediction_flag parameter and a default_motion_prediction_flag parameter, wherein the second predictor and the second comparer are configured to set the adaptive_motion_prediction_flag parameter to false in one or more slices, the encoder is further configured to not include the motion_prediction_flag_l0 or motion_prediction_flag_l1 parameters in its encoding of the macroblocks or macroblock partitions associated with the one or more slices, and where the second predictor and the second comparer are further configured to assume a value indicated by the default_motion_prediction_flag for the motion_prediction_flag_l0 and motion_prediction_flag_l1 parameters of all macroblocks or macroblock partitions of the one or more slices.
-
-
9. A scalable video communication system comprising:
-
a decoding system for decoding of scalable digital video, the system comprising; an input configured to receive a scalable digital video bitstream comprising groups of coded units of a quality or spatial enhancement target layer and at least one additional layer, the digital video bitstream containing control data (including prediction control data) associated with a group of coded units and control (including prediction), texture, or motion data associated with individual coded units; a decoder coupled to the input, wherein the decoder is configured to decode the control data associated with a group of coded units of the target layer and the at least one additional layer, and control, texture, or motion data associated with individual coded units of the target layer and the at least one additional layer; a predictor coupled to the decoder, wherein the decoder is configured to generate prediction references for the control, texture, or motion data of a plurality of coded units of the target layer as signaled prediction control data associated with a group of coded units of the target layer or the at least one additional layer, or from prediction control data associated with individual coded units of the target layer or the at least one additional layer; and a combiner coupled to the predictor, wherein the combiner is configured to combine the generated prediction references with the corresponding decoded control, texture, or motion data associated with the plurality of coded units of the target layer to produce portions of a decoded picture corresponding to the plurality of coded units of the target layer, wherein the prediction control data associated with the groups of coded units of the target layer or the at least one additional layer and the prediction control data associated with individual coded units of the target layer or the at least one additional layer include inter-layer prediction control data, and wherein the predictor is configured to use values indicated by the inter-layer prediction control data associated with a group of coded units of the target layer when the corresponding inter-layer prediction control data associated with individual coded units of the group of coded units of the target layer are not present in the digital video bitstream; an encoding system for scalable coding of digital video, the system comprising; an input configured to receive digital video input pictures; an optionally operated downsampler coupled to the input, wherein the down sampler is configured to generate a downsampled picture of an input picture at a lower resolution; a first prediction estimator coupled to either the optional downsampler or the input and a first combiner that provides a plurality of previously decoded base layer pictures to be used as reference pictures, wherein the first prediction estimator is configured to generate a first set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the (optionally downsampled) picture, or control data (including prediction control data) associated with a group of coded units of the (optionally downsampled) picture, and to generate the portions of a first prediction reference picture that correspond to the coded units; a first comparer coupled to the first prediction estimator and the optional downsampler or input, wherein the first comparer is configured to compute the difference between the (optionally downsampled) input picture and the portions of a first prediction reference picture, and generate a second set of control (including prediction) and texture data associated with a plurality of the coded units of the (optionally downsampled) input picture, and control (including prediction) data associated with a group of coded units of the (optionally downsampled) input picture, a first combiner coupled to the first comparer and the first prediction estimator wherein the first combiner is configured to combine the second set of generated control (including prediction) and texture data with their corresponding portions of the first prediction reference picture to generate the corresponding portions of a new base layer decoded picture; a second prediction estimator coupled to the input and a second combiner that provides a plurality of previously decoded enhancement layer pictures to be used as reference pictures, wherein the second prediction estimator is configured to generate a third set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the input picture, or control data (including prediction control data) associated with a group of coded units of the input picture, and also generating the portions of a second prediction reference picture that correspond to the coded units; a second comparer coupled to the second prediction estimator and the input, wherein the second comparer is configured to compute the difference between the input picture and the second prediction reference picture, and generate a fourth set of control (including prediction) and texture data associated with a plurality of the coded units of the input picture, as well as control (including prediction) data associated with a group of coded units of the input picture, a second combiner coupled to the second comparer and the second prediction estimator, wherein the second combiner is configured to combine the fourth set of generated control (including prediction) and texture data with their corresponding portions of the second prediction reference picture to generate the corresponding portions of a new enhancement layer decoded picture; and an encoder configured to encode the first set of control (including prediction) and motion data and the second set of control (including prediction) and texture data to produce a base layer bit stream, the third set of control (including prediction) and motion data and the fourth set of control (including prediction) and texture data to produce an enhancement layer bit stream, and multiplex the data into a single output bit stream, wherein the third and fourth sets of control data include inter-layer prediction control data, and wherein the second prediction estimator and the second comparer are further configured to set inter-layer prediction control data values in one or more groups of coded units of the input picture such that corresponding inter-layer prediction control data values in the coded units of the input picture associated with the one or more groups of coded units of the input picture are not transmitted; and a communication network connecting the output of the encoding system to the input of the decoding system, wherein the second prediction estimator and the second comparer of the encoding system are further configured to use telescopic inter-layer prediction control data values in more or less groups of coded units of the input picture, depending on the bit rate available in the communication network.
-
-
10. A scalable video communication system comprising:
-
a communication network; a decoding system for decoding of scalable digital video, the system comprising; an input configured to receive a scalable digital video bitstream comprising groups of coded units of a quality or spatial enhancement target layer and at least one additional layer, the digital video bitstream containing control data (including prediction control data) associated with a group of coded units and control (including prediction), texture, or motion data associated with individual coded units; a decoder coupled to the input, wherein the decoder is configured to decode the control data associated with a group of coded units of the target layer and the at least one additional layer, and control, texture, or motion data associated with individual coded units of the target layer and the at least one additional layer; a predictor coupled to the decoder, wherein the decoder is configured to generate prediction references for the control, texture, or motion data of a plurality of coded units of the target layer as signaled prediction control data associated with a group of coded units of the target layer or the at least one additional layer, or from prediction control data associated with individual coded units of the target layer or the at least one additional layer; and a combiner coupled to the predictor, wherein the combiner is configured to combine the generated prediction references with the corresponding decoded control, texture, or motion data associated with the plurality of coded units of the target layer to produce portions of a decoded picture corresponding to the plurality of coded units of the target layer, wherein the prediction control data associated with the groups of coded units of the target layer or the at least one additional layer and the prediction control data associated with individual coded units of the target layer or the at least one additional layer include inter-layer prediction control data, and wherein the predictor is configured to use values indicated by the inter-layer prediction control data associated with a group of coded units of the target layer when the corresponding inter-layer prediction control data associated with individual coded units of the group of coded units of the target layer are not present in the digital video bitstream; an encoding system for scalable coding of digital video, the system comprising; an input configured to receive digital video input pictures; an optionally operated downsampler coupled to the input, wherein the down sampler is configured to generate a downsampled picture of an input picture at a lower resolution; a first prediction estimator coupled to either the optional downsampler or the input and a first combiner that provides a plurality of previously decoded base layer pictures to be used as reference pictures, wherein the first prediction estimator is configured to generate a first set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the (optionally downsampled) picture, or control data (including prediction control data) associated with a group of coded units of the (optionally downsampled) picture, and to generate the portions of a first prediction reference picture that correspond to the coded units; a first comparer coupled to the first prediction estimator and the optional downsampler or input, wherein the first comparer is configured to compute the difference between the (optionally downsampled) input picture and the portions of a first prediction reference picture, and generate a second set of control (including prediction) and texture data associated with a plurality of the coded units of the (optionally downsampled) input picture, and control (including prediction) data associated with a group of coded units of the (optionally downsampled) input picture, a first combiner coupled to the first comparer and the first prediction estimator wherein the first combiner is configured to combine the second set of generated control (including prediction) and texture data with their corresponding portions of the first prediction reference picture to generate the corresponding portions of a new base layer decoded picture; a second prediction estimator coupled to the input and a second combiner that provides a plurality of previously decoded enhancement layer pictures to be used as reference pictures, wherein the second prediction estimator is configured to generate a third set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the input picture, or control data (including prediction control data) associated with a group of coded units of the input picture, and also generating the portions of a second prediction reference picture that correspond to the coded units; a second comparer coupled to the second prediction estimator and the input, wherein the second comparer is configured to compute the difference between the input picture and the second prediction reference picture, and generate a fourth set of control (including prediction) and texture data associated with a plurality of the coded units of the input picture, as well as control (including prediction) data associated with a group of coded units of the input picture, a second combiner coupled to the second comparer and the second prediction estimator, wherein the second combiner is configured to combine the fourth set of generated control (including prediction) and texture data with their corresponding portions of the second prediction reference picture to generate the corresponding portions of a new enhancement layer decoded picture; and an encoder configured to encode the first set of control (including prediction) and motion data and the second set of control (including prediction) and texture data to produce a base layer bit stream, the third set of control (including prediction) and motion data and the fourth set of control (including prediction) and texture data to produce an enhancement layer bit stream, and multiplex the data into a single output bit stream, wherein the third and fourth sets of control data include inter-layer prediction control data, and wherein the second prediction estimator and the second comparer are further configured to set inter-layer prediction control data values in one or more groups of coded units of the input picture such that corresponding inter-layer prediction control data values in the coded units of the input picture associated with the one or more groups of coded units of the input picture are not transmitted; and a Scalable Video Communication Server (“
SVCS”
) connected to the encoding system and the decoding system over the communication network,wherein the SVCS is configured to replace one or more enhancement layer slices received by the encoding system with slices that only signal telescopic inter-layer prediction and do not contain macroblock texture or motion data, prior to forwarding them to the decoding system.
-
-
11. A method for decoding of scalable digital video, the method comprising:
-
at an input, receiving a scalable digital video bitstream comprising groups of coded units of a quality or spatial enhancement target layer and at least one additional layer, the digital video bitstream containing control data (including prediction control data) associated with a group of coded units and control (including prediction), texture, or motion data associated with individual coded units; at a decoder, decoding the received input by decoding the control data associated with a group of coded units of the target layer and the at least one additional layer, and control, texture, or motion data associated with individual coded units of the target layer and the at least one additional layer; using a predictor coupled to the decoder, to generate prediction references for the control, texture, or motion data of a plurality of coded units of the target layer from signaled prediction control data associated with a group of coded units of the target layer or the at least one additional layer, or from prediction control data associated with individual coded units of the target layer or the at least one additional layer; and at a combiner coupled to the predictor, combining using a processor the generated prediction references with the corresponding decoded control, texture, or motion data associated with the plurality of coded units of the target layer to produce portions of a decoded picture corresponding to the plurality of coded units of the target layer, wherein the prediction control data associated with the groups of coded units of the target layer or the at least one additional layer and the prediction control data associated with individual coded units of the target layer or the at least one additional layer include inter-layer prediction control data, and wherein the predictor is configured to use values indicated by the inter-layer prediction control data associated with a group of coded units of the target layer when the corresponding inter-layer prediction control data associated with individual coded units of the group of coded units of the target layer are not present in the digital video bitstream. - View Dependent Claims (12, 13, 14, 21)
the inter-layer prediction control data associated with a macroblock or macroblock partition comprise a residual_prediction_flag parameter; and the inter-layer prediction control data associated with a slice comprise an adaptive_residual_prediction_flag parameter, the method further comprising at the decoder omitting decoding of the residual_prediction_flag parameter in macroblock or macrobock partitions of a slice for which the adaptive_residual_prediction_flag parameter is not set, and at the predictor assuming a default value for the residual_prediction_flag parameter for all macroblocks or macroblock partitions of the slice.
-
-
13. The method of claim 11, wherein
a received digital video bitstream conforms to the SVC JD8 specification, wherein coded units correspond to macroblocks or macroblock partitions and groups of coded units correspond to slices, extended such that: -
the inter-layer prediction control data associated with a macroblock or macroblock partition comprise a base_mode_flag parameter; the inter-layer prediction control data associated with a slice comprise an adaptive_prediction_flag parameter and a default_base_mode_flag parameter, the method further comprising at the decoder omitting decoding of the base_mode_flag parameter in macroblock or macrobock partitions of a slice for which the adaptive_prediction_flag parameter is not set, and at the predictor assuming a value indicated by the default_base_mode_flag parameter for the base_mode_flag parameter for all macroblocks or macroblock partitions of the slice.
-
-
14. The method of claim 11, wherein
a received digital video bitstream conforms to the SVC JD8 specification, wherein coded units correspond to macroblocks or macroblock partitions and groups of coded units correspond to slices, extended such that: -
the inter-layer prediction control data associated with a macroblock or macroblock partition comprise a motion_prediction_flag_l0 parameter and a motion_prediction_flag_l1 parameter; and the inter-layer prediction control data associated with a slice comprise an adaptive_motion_prediction_flag parameter and a default_motion_prediction_flag parameter, the method further comprising at the decoder omitting decoding of the motion_prediction_flag_l0 or motion_prediction_flag_l1 parameters in macroblock or macrobock partitions of a slice for which the adaptive motion_prediction_flag parameter is not set, and at the predictor assuming a value indicated by the default motion_prediction_flag parameter for the motion_prediction_flag_l0 and motion_prediction_flag_l1 parameters for all macroblocks or macroblock partitions of the slice.
-
-
21. A computer program product, comprising a computer usable medium having a computer readable program code embodied therein, said computer readable program code adapted to be executed to implement at least one of the method claims 11-20.
-
15. A method for scalable coding of digital video, the method comprising:
-
at an input receiving digital video input pictures; optionally operating a downsampler coupled to the input to generate a downsampled picture of an input picture at a lower resolution; at a first prediction estimator coupled to either the optionally operated downsampler or the input and a first combiner that provides a plurality of previously decoded base layer pictures to be used as reference pictures, generating a first set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the (optionally downsampled) picture, or control data (including prediction control data) associated with a group of coded units of the (optionally downsampled) picture, and to generate the portions of a first prediction reference picture that correspond to the coded units; at a first comparer coupled to the first prediction estimator and the optional downsampler or input, computing the difference between the (optionally downsampled) input picture and the portions of a first prediction reference picture, and generating a second set of control (including prediction) and texture data associated with a plurality of the coded units of the (optionally downsampled) input picture, and control (including prediction) data associated with a group of coded units of the (optionally downsampled) input picture, at a first combiner coupled to the first comparer and the first prediction estimator, combining using a processor the second set of generated control (including prediction) and texture data with their corresponding portions of the first prediction reference picture to generate the corresponding portions of a new base layer decoded picture; at a second prediction estimator coupled to the input and a second combiner that provides a plurality of previously decoded enhancement layer pictures to be used as reference pictures, generating a third set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the input picture, or control data (including prediction control data) associated with a group of coded units of the input picture, and generating the portions of a second prediction reference picture that correspond to the coded units; at a second comparer coupled to the second prediction estimator and the input, computing the difference between the input picture and the second prediction reference picture, and generating a fourth set of control (including prediction) and texture data associated with a plurality of the coded units of the input picture, and control (including prediction) data associated with a group of coded units of the input picture, at a second combiner coupled to the second comparer and the second prediction estimator, combining the fourth set of generated control (including prediction) and texture data with their corresponding portions of the second prediction reference picture to generate the corresponding portions of a new enhancement layer decoded picture; and at an encoder, encoding the first set of control (including prediction) and motion data and the second set of control (including prediction) and texture data to produce a base layer bit stream, the third set of control (including prediction) and motion data and the fourth set of control (including prediction) and texture data to produce an enhancement layer bit stream, and multiplexing the data into a single output bit stream, wherein the third and fourth sets of control data include inter-layer prediction control data, and at the second prediction estimator and the second comparer setting inter-layer prediction control data values in one or more groups of coded units of the input picture such that corresponding inter-layer prediction control data values in the coded units of the input picture associated with the one or more groups of coded units of the input picture are not transmitted. - View Dependent Claims (16, 17, 18)
the method further comprising; at the second prediction estimator and the second comparer setting the adaptive_prediction_flag parameter to false in one or more slices; at the encoder, omitting the residual_prediction_flag in its encoding of the macroblocks or macroblock partitions associated with the one or more slices, and at the second predictor and second comparer assuming a default value for the residual_prediction_flag parameter for all macroblocks or macroblock partitions of the one or more slices.
-
-
17. The method of claim 15, wherein the first and second prediction estimator, the first and second comparer, and the encoder are configured to produce and output bit stream conforming to the SVC JD 8 specification, wherein coded units correspond to macroblocks or macroblock partitions an groups of coded units correspond to slices, extended such that:
-
the inter-layer prediction control data associated with a macroblock or macroblock partition comprise a base_mode_flag parameter; the inter-layer prediction control data associated with a slice comprise an adaptive_prediction_flag parameter and a default_base_mode_flag parameter, the method further comprising; at the second prediction estimator and the second comparer setting the adaptive_prediction_flag parameter to false in one or more slices; at the encoder omitting the base_mode_flag in its encoding of the macroblocks or macroblock partitions associated with the one or more slices; and
at the second predictor and the second comparer assuming a value indicated by the default_base_mode_flag for the base_mode_flag parameter of all macroblocks or macroblock partitions of the one or more slices.
-
-
18. The method of claim 15 wherein the first and second prediction estimator, the first and second comparers, and the encoder are configured to produce and output bit stream conforming to the SVC JD 8 specification, wherein coded units correspond to macroblocks or macroblock partitions and groups of coded units correspond to slices, extended such that:
-
the inter-layer prediction control data associated with a macroblock or macroblock partition comprise a motion_prediction_flag_l0 parameter and a motion_prediction_flag_l1 parameter; the inter-layer prediction control data associated with a slice comprise an adaptive_motion_prediction_flag parameter and a default_motion_prediction_flag parameter, the method further comprising; at the second predictor and the second comparer setting the adaptive_motion_prediction_flag parameter to false in one or more slices; at the encoder omitting the motion_prediction_flag_l0 or motion_prediction_flag_l1 parameters in its encoding of the macroblocks or macroblock partitions associated with the one or more slices; and at the second predictor and the second comparer assuming a value indicated by the default_motion_prediction_flag for the motion_prediction_flag_l0 and motion_prediction_flag_l1 parameters of all macroblocks or macroblock partitions of the one or more slices.
-
-
19. A scalable video communication method comprising:
-
a decoding method comprising; at an input, receiving a scalable digital video bitstream comprising groups of coded units of a quality or spatial enhancement target layer and at least one additional layer, the digital video bitstream containing control data (including prediction control data) associated with a group of coded units and control (including prediction), texture, or motion data associated with individual coded units; at a decoder, decoding the received input by decoding the control data associated with a group of coded units of the target layer and the at least one additional layer, and control, texture, or motion data associated with individual coded units of the target layer and the at least one additional layer; using a predictor coupled to the decoder, to generate prediction references for the control, texture, or motion data of a plurality of coded units of the target layer from signaled prediction control data associated with a group of coded units of the target layer or the at least one additional layer, or from prediction control data associated with individual coded units of the target layer or the at least one additional layer; and at a combiner coupled to the predictor, combining using a processor the generated prediction references with the corresponding decoded control, texture, or motion data associated with the plurality of coded units of the target layer to produce portions of a decoded picture corresponding to the plurality of coded units of the target layer, wherein the prediction control data associated with the groups of coded units of the target layer or the at least one additional layer and the prediction control data associated with individual coded units of the target layer or the at least one additional layer include inter-layer prediction control data, and wherein the predictor is configured to use values indicated by the inter-layer prediction control data associated with a group of coded units of the target layer when the corresponding inter-layer prediction control data associated with individual coded units of the group of coded units of the target layer are not present in the digital video bitstream; an encoding method comprising; at an input receiving digital video input pictures; optionally operating a downsampler coupled to the input to generate a downsampled picture of an input picture at a lower resolution; at a first prediction estimator coupled to either the optionally operated downsampler or the input and a first combiner that provides a plurality of previously decoded base layer pictures to be used as reference pictures, generating a first set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the (optionally downsampled) picture, or control data (including prediction control data) associated with a group of coded units of the (optionally downsampled) picture, and to generate the portions of a first prediction reference picture that correspond to the coded units; at a first comparer coupled to the first prediction estimator and the optional downsampler or input, computing the difference between the (optionally downsampled) input picture and the portions of a first prediction reference picture, and generating a second set of control (including prediction) and texture data associated with a plurality of the coded units of the (optionally downsampled) input picture, and control (including prediction) data associated with a group of coded units of the (optionally downsampled) input picture, at a first combiner coupled to the first comparer and the first prediction estimator, combining the second set of generated control (including prediction) and texture data with their corresponding portions of the first prediction reference picture to generate the corresponding portions of a new base layer decoded picture; at a second prediction estimator coupled to the input and a second combiner that provides a plurality of previously decoded enhancement layer pictures to be used as reference pictures, generating a third set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the input picture, or control data (including prediction control data) associated with a group of coded units of the input picture, and generating the portions of a second prediction reference picture that correspond to the coded units; at a second comparer coupled to the second prediction estimator and the input, computing the difference between the input picture and the second prediction reference picture, and generating a fourth set of control (including prediction) and texture data associated with a plurality of the coded units of the input picture, and control (including prediction) data associated with a group of coded units of the input picture, at a second combiner coupled to the second comparer and the second prediction estimator, combining the fourth set of generated control (including prediction) and texture data with their corresponding portions of the second prediction reference picture to generate the corresponding portions of a new enhancement layer decoded picture; and at an encoder, encoding the first set of control (including prediction) and motion data and the second set of control (including prediction) and texture data to produce a base layer bit stream, the third set of control (including prediction) and motion data and the fourth set of control (including prediction) and texture data to produce an enhancement layer bit stream, and multiplexing the data into a single output bit stream, wherein the third and fourth sets of control data include inter-layer prediction control data, and at the second prediction estimator and the second comparer setting inter-layer prediction control data values in one or more groups of coded units of the input picture such that corresponding inter-layer prediction control data values in the coded units of the input picture associated with the one or more groups of coded units of the input picture are not transmitted; connecting the output of the encoding method to the input of the decoding method; and at the second prediction estimator and the second comparer of the encoding method using telescopic inter-layer prediction control data values in more or less groups of coded units of the input picture, depending on the bit rate available in the communication network.
-
-
20. A method for scalable video communication over a system comprising:
-
a communication network; a decoding system for decoding of scalable digital video, the system comprising; an input configured to receive a scalable digital video bitstream comprising groups of coded units of a quality or spatial enhancement target layer and at least one additional layer, the digital video bitstream containing control data (including prediction control data) associated with a group of coded units and control (including prediction), texture, or motion data associated with individual coded units; a decoder coupled to the input, wherein the decoder is configured to decode the control data associated with a group of coded units of the target layer and the at least one additional layer, and control, texture, or motion data associated with individual coded units of the target layer and the at least one additional layer; a predictor coupled to the decoder, wherein the decoder is configured to generate prediction references for the control, texture, or motion data of a plurality of coded units of the target layer as signaled prediction control data associated with a group of coded units of the target layer or the at least one additional layer, or from prediction control data associated with individual coded units of the target layer or the at least one additional layer; and a combiner coupled to the predictor, wherein the combiner is configured to combine the generated prediction references with the corresponding decoded control, texture, or motion data associated with the plurality of coded units of the target layer to produce portions of a decoded picture corresponding to the plurality of coded units of the target layer, wherein the prediction control data associated with the groups of coded units of the target layer or the at least one additional layer and the prediction control data associated with individual coded units of the target layer or the at least one additional layer include inter-layer prediction control data, and wherein the predictor is configured to use values indicated by the inter-layer prediction control data associated with a group of coded units of the target layer when the corresponding inter-layer prediction control data associated with individual coded units of the group of coded units of the target layer are not present in the digital video bitstream; an encoding system for scalable coding of digital video, the system comprising; an input configured to receive digital video input pictures; an optionally operated downsampler coupled to the input, wherein the down sampler is configured to generate a downsampled picture of an input picture at a lower resolution; a first prediction estimator coupled to either the optional downsampler or the input and a first combiner that provides a plurality of previously decoded base layer pictures to be used as reference pictures, wherein the first prediction estimator is configured to generate a first set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the (optionally downsampled) picture, or control data (including prediction control data) associated with a group of coded units of the (optionally downsampled) picture, and to generate the portions of a first prediction reference picture that correspond to the coded units; a first comparer coupled to the first prediction estimator and the optional downsampler or input, wherein the first comparer is configured to compute the difference between the (optionally downsampled) input picture and the portions of a first prediction reference picture, and generate a second set of control (including prediction) and texture data associated with a plurality of the coded units of the (optionally downsampled) input picture, and control (including prediction) data associated with a group of coded units of the (optionally downsampled) input picture, a first combiner coupled to the first comparer and the first prediction estimator wherein the first combiner is configured to combine the second set of generated control (including prediction) and texture data with their corresponding portions of the first prediction reference picture to generate the corresponding portions of a new base layer decoded picture; a second prediction estimator coupled to the input and a second combiner that provides a plurality of previously decoded enhancement layer pictures to be used as reference pictures, wherein the second prediction estimator is configured to generate a third set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the input picture, or control data (including prediction control data) associated with a group of coded units of the input picture, and also generating the portions of a second prediction reference picture that correspond to the coded units; a second comparer coupled to the second prediction estimator and the input, wherein the second comparer is configured to compute the difference between the input picture and the second prediction reference picture, and generate a fourth set of control (including prediction) and texture data associated with a plurality of the coded units of the input picture, as well as control (including prediction) data associated with a group of coded units of the input picture, a second combiner coupled to the second comparer and the second prediction estimator, wherein the second combiner is configured to combine the fourth set of generated control (including prediction) and texture data with their corresponding portions of the second prediction reference picture to generate the corresponding portions of a new enhancement layer decoded picture; and an encoder configured to encode the first set of control (including prediction) and motion data and the second set of control (including prediction) and texture data to produce a base layer bit stream, the third set of control (including prediction) and motion data and the fourth set of control (including prediction) and texture data to produce an enhancement layer bit stream, and multiplex the data into a single output bit stream, wherein the third and fourth sets of control data include inter-layer prediction control data, and wherein the second prediction estimator and the second comparer are further configured to set inter-layer prediction control data values in one or more groups of coded units of the input picture such that corresponding inter-layer prediction control data values in the coded units of the input picture associated with the one or more groups of coded units of the input picture are not transmitted; and a Scalable Video Communication Server (“
SVCS”
) connected to the encoding system and the decoding system over the communication network,the method comprising;
at the SVCS, replacing using a processor one or more enhancement layer slices received by the encoding system with slices that only signal telescopic inter-layer prediction and do not contain macroblock texture or motion data, prior to forwarding them to the decoding system.
-
Specification