System and method for scalable video coding using telescopic mode flags

US 8,396,134 B2
Filed: 11/19/2009
Issued: 03/12/2013
Est. Priority Date: 07/21/2006
Status: Active Grant

First Claim

Patent Images

1. A system for decoding of scalable digital video, the system comprising:

an input configured to receive a scalable digital video bitstream comprising slices of a quality or spatial enhancement target layer and at least one additional layer in accordance with the SVC JD8 specification, the digital video bitstream containing control data (including prediction control data) associated with slices and control (including prediction), texture, or motion data associated with macroblocks or macroblock partitions;

a decoder coupled to the input, wherein the decoder is configured to decode the control data associated with a slice of the target layer and the at least one additional layer, and control, texture, or motion data associated with individual macroblock or macroblock partitions of the target layer and the at least one additional layer;

a predictor coupled to the decoder, wherein the decoder is configured to generate prediction references for the control, texture, or motion data of a plurality of macroblocks or macroblock partitions of the target layer as signaled prediction control data associated with a slice of the target layer or the at least one additional layer, or from prediction control data associated with individual macroblocks or macroblock partitions of the target layer or the at least one additional layer; and

a combiner coupled to the predictor, wherein the combiner is configured to combine the generated prediction references with the corresponding decoded control, texture, or motion data associated with the plurality of macroblocks or macroblock partitions of the target layer to produce portions of a decoded picture corresponding to the plurality of macroblocks or macroblock partitions of the target layer,wherein the prediction control data associated with the slices of the target layer or the at least one additional layer include an adaptive_residual_prediction_flag parameter and, if the adaptive_residual_prediction_flag parameter is not set, a default_residual_prediction_flag parameter and the prediction control data associated with individual macroblocks or macroblock partitions of the target layer or the at least one additional layer include a residual_prediction_flag parameter, and wherein the decoder is configured not to decode the residual_prediction_flag parameter in macroblock or macroblock partitions of a slice for which the adaptive_residual_prediction_flag parameter is set but decode instead the default_residual_prediction_flag of the slice, and wherein the predictor is further configured to use the value of default_residual_prediction_flag as the value for the residual_prediction_flag parameter for all macroblocks or macroblock partitions of the slice.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for scalable video coding using special inter-layer prediction modes (called telescopic modes) are provided. These modes facilitate accelerated operation of encoders with improved coding efficiency.

15 Citations

View as Search Results

8 Claims

1. A system for decoding of scalable digital video, the system comprising:
- an input configured to receive a scalable digital video bitstream comprising slices of a quality or spatial enhancement target layer and at least one additional layer in accordance with the SVC JD8 specification, the digital video bitstream containing control data (including prediction control data) associated with slices and control (including prediction), texture, or motion data associated with macroblocks or macroblock partitions;
  
  a decoder coupled to the input, wherein the decoder is configured to decode the control data associated with a slice of the target layer and the at least one additional layer, and control, texture, or motion data associated with individual macroblock or macroblock partitions of the target layer and the at least one additional layer;
  
  a predictor coupled to the decoder, wherein the decoder is configured to generate prediction references for the control, texture, or motion data of a plurality of macroblocks or macroblock partitions of the target layer as signaled prediction control data associated with a slice of the target layer or the at least one additional layer, or from prediction control data associated with individual macroblocks or macroblock partitions of the target layer or the at least one additional layer; and
  
  a combiner coupled to the predictor, wherein the combiner is configured to combine the generated prediction references with the corresponding decoded control, texture, or motion data associated with the plurality of macroblocks or macroblock partitions of the target layer to produce portions of a decoded picture corresponding to the plurality of macroblocks or macroblock partitions of the target layer,wherein the prediction control data associated with the slices of the target layer or the at least one additional layer include an adaptive_residual_prediction_flag parameter and, if the adaptive_residual_prediction_flag parameter is not set, a default_residual_prediction_flag parameter and the prediction control data associated with individual macroblocks or macroblock partitions of the target layer or the at least one additional layer include a residual_prediction_flag parameter, and wherein the decoder is configured not to decode the residual_prediction_flag parameter in macroblock or macroblock partitions of a slice for which the adaptive_residual_prediction_flag parameter is set but decode instead the default_residual_prediction_flag of the slice, and wherein the predictor is further configured to use the value of default_residual_prediction_flag as the value for the residual_prediction_flag parameter for all macroblocks or macroblock partitions of the slice.

2. A system for scalable coding of digital video, the system comprising:
- an input configured to receive digital video input pictures;
  
  an optionally operated downsampler coupled to the input, wherein the down sampler is configured to generate a downsampled picture of an input picture at a lower resolution;
  
  a first prediction estimator coupled to either the optional downsampler or the input and a first combiner that provides a plurality of previously decoded base layer pictures to be used as reference pictures, wherein the first prediction estimator is configured to generate a first set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the (optionally downsampled) picture, or control data (including prediction control data) associated with a group of coded units of the (optionally downsampled) picture, and to generate the portions of a first prediction reference picture that correspond to the coded units;
  
  a first comparer coupled to the first prediction estimator and the optional downsampler or input, wherein the first comparer is configured to compute the difference between the (optionally downsampled) input picture and the portions of a first prediction reference picture, and generate a second set of control (including prediction) and texture data associated with a plurality of the coded units of the (optionally downsampled) input picture, and control (including prediction) data associated with a group of coded units of the (optionally downsampled) input picture,a first combiner coupled to the first comparer and the first prediction estimator wherein the first combiner is configured to combine the second set of generated control (including prediction) and texture data with their corresponding portions of the first prediction reference picture to generate the corresponding portions of a new base layer decoded picture;
  
  a second prediction estimator coupled to the input and a second combiner that provides a plurality of previously decoded enhancement layer pictures to be used as reference pictures, wherein the second prediction estimator is configured to generate a third set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the input picture, or control data (including prediction control data) associated with a group of coded units of the input picture, and also generating the portions of a second prediction reference picture that correspond to the coded units;
  
  a second comparer coupled to the second prediction estimator and the input, wherein the second comparer is configured to compute the difference between the input picture and the second prediction reference picture, and generate a fourth set of control (including prediction) and texture data associated with a plurality of the coded units of the input picture, as well as control (including prediction) data associated with a group of coded units of the input picture,a second combiner coupled to the second comparer and the second prediction estimator, wherein the second combiner is configured to combine the fourth set of generated control (including prediction) and texture data with their corresponding portions of the second prediction reference picture to generate the corresponding portions of a new enhancement layer decoded picture; and
  
  an encoder configured to encode the first set of control (including prediction) and motion data and the second set of control (including prediction) and texture data to produce a base layer bit stream, the third set of control (including prediction) and motion data and the fourth set of control (including prediction) and texture data to produce an enhancement layer bit stream, and multiplex the data into a single output bit stream,wherein the third and fourth sets of control data include inter-layer prediction control data, and wherein the second prediction estimator and the second comparer are further configured to set inter-layer prediction control data values in one or more groups of coded units of the input picture such that corresponding inter-layer prediction control data values in the coded units of the input picture associated with the one or more groups of coded units of the input picture are not transmitted but instead a single value is transmited for the entire group of coded units and is used by the decoder.
- View Dependent Claims (3)
- - 3. The system of claim 2, wherein the first and second prediction estimators, the first and second comparers, and the encoder are configured to produce and output bit stream conforming to the SVC JD 8 specification, wherein coded units correspond to macroblocks or macroblock partitions and groups of coded units correspond to slices, extended such that:
    - the inter-layer prediction control data associated with a macroblock or macroblock partition comprise a residual_prediction_flag parameter;
      
      the inter-layer prediction control data associated with a slice comprise an adaptive_residual_prediction_flag parameter and, if the adaptive_residual_prediction_flag parameter is not set, a default_residual_prediction_flag parameter,

4. A method for decoding of scalable digital video, the method comprising:
- at an input, receiving a scalable digital video bitstream comprising slices of a quality or spatial enhancement target layer and at least one additional layer in accordance with the SVC JD8 specification, the digital video bitstream containing control data (including prediction control data) associated with slices and control (including prediction), texture, or motion data associated with macroblocks or macroblock partitions;
  
  at a decoder, decoding the received input by decoding the control data associated with a slice of the target layer and the at least one additional layer, and control, texture, or motion data associated with individual macroblocks or macroblocks partitions of the target layer and the at least one additional layer;
  
  using a predictor coupled to the decoder, to generate prediction references for the control, texture, or motion data of a plurality of macroblocks or macroblock partitions of the target layer from signaled prediction control data associated with a slice of the target layer or the at least one additional layer, or from prediction control data associated with individual macroblocks or macroblock partitions of the target layer or the at least one additional layer; and
  
  at a combiner coupled to the predictor, combining using a processor the generated prediction references with the corresponding decoded control, texture, or motion data associated with the plurality of macroblocks or macroblock partitions of the target layer to produce portions of a decoded picture corresponding to the plurality of macroblocks or macroblock partitions of the target layer,wherein the prediction control data associated with the slices of the target layer or the at least one additional layer include an adaptive_residual_prediction_flag parameter and, if the adaptive_residual_prediction_flag parameter is not set, a default_residual_prediction_flag paramter and the prediction control data associated with individual macroblocks or macroblock partitions of the target layer or the at least one additional layer include a residual_prediction_flag paramter, and wherein the decoder is configured not to decode the residual_prediction_flag parameter in macroblock or macroblock partitions of a slice for which the adaptive_residual_prediction_flag parameter is set but decode instead the default_residual_prediction_flag of the slice, and wherein the predictor is further configured to use the value of default_residual_prediction_flag as the value for the residual_prediction_flag parameter for all macroblocks or macroblock partitions of the slice.

5. A method for scalable coding of digital video, the method comprising:
- at an input receiving digital video input pictures;
  
  optionally operating a downsampler coupled to the input to generate a downsampled picture of an input picture at a lower resolution;
  
  at a first prediction estimator coupled to either the optionally operated downsampler or the input and a first combiner that provides a plurality of previously decoded base layer pictures to be used as reference pictures, generating a first set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the (optionally downsampled) picture, or control data (including prediction control data) associated with a group of coded units of the (optionally downsampled) picture, and to generate the portions of a first prediction reference picture that correspond to the coded units;
  
  at a first comparer coupled to the first prediction estimator and the optional downsampler or input, computing the difference between the (optionally downsampled) input picture and the portions of a first prediction reference picture, and generating a second set of control (including prediction) and texture data associated with a plurality of the coded units of the (optionally downsampled) input picture, and control (including prediction) data associated with a group of coded units of the (optionally downsampled) input picture,at a first combiner coupled to the first comparer and the first prediction estimator, combining using a processor the second set of generated control (including prediction) and texture data with their corresponding portions of the first prediction reference picture to generate the corresponding portions of a new base layer decoded picture;
  
  at a second prediction estimator coupled to the input and a second combiner that provides a plurality of previously decoded enhancement layer pictures to be used as reference pictures, generating a third set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the input picture, or control data (including prediction control data) associated with a group of coded units of the input picture, and generating the portions of a second prediction reference picture that correspond to the coded units;
  
  at a second comparer coupled to the second prediction estimator and the input, computing the difference between the input picture and the second prediction reference picture, and generating a fourth set of control (including prediction) and texture data associated with a plurality of the coded units of the input picture, and control (including prediction) data associated with a group of coded units of the input picture,at a second combiner coupled to the second comparer and the second prediction estimator, combining the fourth set of generated control (including prediction) and texture data with their corresponding portions of the second prediction reference picture to generate the corresponding portions of a new enhancement layer decoded picture; and
  
  at an encoder, encoding the first set of control (including prediction) and motion data and the second set of control (including prediction) and texture data to produce a base layer bit stream, the third set of control (including prediction) and motion data and the fourth set of control (including prediction) and texture data to produce an enhancement layer bit stream, and multiplexing the data into a single output bit stream,wherein the third and fourth sets of control data include inter-layer prediction control data, and at the second prediction estimator and the second comparer setting inter-layer prediction control data values in one or more groups of coded units of the input picture such that corresponding inter-layer prediction control data values in the coded units of the input picture associated with the one or more groups of coded units of the input picture are not transmitted but instead a single value is transmitted for the entire group of coded units and is used by the decoder.
- View Dependent Claims (6)
- - 6. The method of claim 5, wherein the first and second prediction estimators, the first and second comparers, and the encoder are configured to produce and output bit stream conforming to the SVC JD8 specification, wherein coded units correspond to macroblocks or macroblock partitions and groups of coded units correspond to slices, extended such that:
    - the inter-layer prediction control data associated with a macroblock or macroblock partition comprise a residual_prediction_flag parameter and, if the adaptive_residual_prediction_flag parameter is not set, a default_residual_prediction_flag parameter;
      
      the inter-layer prediction control data associated with a slice comprise an adaptive_residual_prediction_flag parameter,

7. A non-transitory computer readable medium for scalable coding of digital video, the computer-readable medium encoded with a computer program comprising a set of instructions operable to direct a processing system to:
- at an input receive digital video input pictures;
  
  optionally operate a downsampler coupled to the input to generate a downsampled picture of an input picture at a lower resolution;
  
  at a first prediction estimator coupled to either the optionally operated downsampler or the input and a first combiner that provides a plurality of previously decoded base layer pictures to be used as reference pictures, generate a first set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the (optionally downsampled) picture, or control data (including prediction control data) associated with a group of coded units of the (optionally downsampled) picture, and to generate the portions of a first prediction reference picture that correspond to the coded units;
  
  at a first comparer coupled to the first prediction estimator and the optional downsampler or input, compute the difference between the (optionally downsampled) input picture and the portions of a first prediction reference picture, and generate a second set of control (including prediction) and texture data associated with a plurality of the coded units of the (optionally downsampled) input picture, and control (including prediction) data associated with a group of coded units of the (optionally downsampled) input picture,at a first combiner coupled to the first comparer and the first prediction estimator, combine using a processor the second set of generated control (including prediction) and texture data with their corresponding portions of the first prediction reference picture to generate the corresponding portions of a new base layer decoded picture;
  
  at a second prediction estimator coupled to the input and a second combiner that provides a plurality of previously decoded enhancement layer pictures to be used as reference pictures, generate a third set of control (including prediction) and motion data prediction references associated with a plurality of the coded units of the input picture, or control data (including prediction control data) associated with a group of coded units of the input picture, and generate the portions of a second prediction reference picture that correspond to the coded units;
  
  at a second comparer coupled to the second prediction estimator and the input, compute the difference between the input picture and the second prediction reference picture, and generate a fourth set of control (including prediction) and texture data associated with a plurality of the coded units of the input picture, and control (including prediction) data associated with a group of coded units of the input picture,at a second combiner coupled to the second comparer and the second prediction estimator, combine the fourth set of generated control (including prediction) and texture data with their corresponding portions of the second prediction reference picture to generate the corresponding portions of a new enhancement layer decoded picture; and
  
  at an encoder, encode the first set of control (including prediction) and motion data and the second set of control (including prediction) and texture data to produce a base layer bit stream, the third set of control (including prediction) and motion data and the fourth set of control (including prediction) and texture data to produce an enhancement layer bit stream, and multiplex the data into a single output bit stream,wherein the third and fourth sets of control data include inter-layer prediction control data, and at the second prediction estimator and the second comparer setting inter-layer prediction control data values in one or more groups of coded units of the input picture such that corresponding inter-layer prediction control data values in the coded units of the input picture associated with the one or more groups of coded units of the input picture are not transmitted but instead a single value is transmitted for the entire group of coded units and is used by the decoder.
- View Dependent Claims (8)
- - 8. The non-transitory computer readable medium of claim 7, wherein the first and second prediction estimators, the first and second comparers, and the encoder are configured to produce and output bit stream conforming to the SVC JD 8 specification, wherein coded units correspond to macroblocks or macroblock partitions and groups of coded units correspond to slices, extended such that:
    - the inter-layer prediction control data associated with a macroblock or macroblock partition comprise a residual_prediction_flag parameter and, if the adaptive_residual_prediction_flag parameter is not set, a default_residual_prediction_flag parameter;
      
      the inter-layer prediction control data associated with a slice comprise an adaptive_residual_prediction_flag parameter,

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Alexandros Eleftheriadis, Danny Hong, Vidyo Incorporated (Enghouse Systems Limited), Ofer Shapiro
Original Assignee
Vidyo Incorporated (Enghouse Systems Limited)
Inventors
Hong, Danny, Eleftheriadis, Alexandros, Shapiro, Ofer
Primary Examiner(s)
Rao, Andy

Application Number

US12/622,074
Publication Number

US 20100067581A1
Time in Patent Office

1,209 Days
Field of Search

37524001-24029
US Class Current

375/240.25
CPC Class Codes

H04N 19/105   Selection of the reference ...

H04N 19/187   the unit being a scalable v...

H04N 19/33   in the spatial domain

System and method for scalable video coding using telescopic mode flags

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

15 Citations

8 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for scalable video coding using telescopic mode flags

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

15 Citations

8 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links