Three-dimensional video with asymmetric spatial resolution

US 9,288,505 B2
Filed: 07/06/2012
Issued: 03/15/2016
Est. Priority Date: 08/11/2011
Status: Expired due to Fees

First Claim

Patent Images

1. A method of coding video data, the method comprising:

decoding video data of a first coded view that comprises a first view component comprising a first texture component having the first resolution and a first depth component having the first resolution to produce a first picture having the first resolution at least in part by;

predicting a first block of the first texture component using a first reference block indicated by a motion vector for the first block; and

predicting a second block of the first depth component using a second reference block indicated by the motion vector for the first block, wherein the second block is spatially collocated, within the first depth component, with the first block of the first texture component;

decoding video data of a second coded view that comprises a second view component comprising a second texture component having the first resolution and a second depth component having the first resolution to produce a second picture having the first resolution;

upsampling the first picture to form a first upsampled picture having a second resolution, wherein the second resolution is greater than the first resolution;

upsampling the second picture to form a second upsampled picture having the second resolution; and

decoding video data of a third coded view that comprises a third view component comprising a third texture component having the second resolution and a third depth component having the second resolution relative to the first upsampled picture and the second upsampled picture to produce a third picture having the second resolution.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A video coding device may be configured to code a bitstream including multiple views plus depth information. Two of the views may have reduced resolutions, while a third view may have a full resolution. The third view may be predicted relative to upsampled versions of the two reduced-resolution views. Each view may include texture data and depth data, such that a view component may include a texture component and a depth component. Moreover, the texture and depth components may be arranged within an access unit according to a particular order, which may simplify component extraction from the access unit.

Citations

51 Claims

1. A method of coding video data, the method comprising:
- decoding video data of a first coded view that comprises a first view component comprising a first texture component having the first resolution and a first depth component having the first resolution to produce a first picture having the first resolution at least in part by;
  
  predicting a first block of the first texture component using a first reference block indicated by a motion vector for the first block; and
  
  predicting a second block of the first depth component using a second reference block indicated by the motion vector for the first block, wherein the second block is spatially collocated, within the first depth component, with the first block of the first texture component;
  
  decoding video data of a second coded view that comprises a second view component comprising a second texture component having the first resolution and a second depth component having the first resolution to produce a second picture having the first resolution;
  
  upsampling the first picture to form a first upsampled picture having a second resolution, wherein the second resolution is greater than the first resolution;
  
  upsampling the second picture to form a second upsampled picture having the second resolution; and
  
  decoding video data of a third coded view that comprises a third view component comprising a third texture component having the second resolution and a third depth component having the second resolution relative to the first upsampled picture and the second upsampled picture to produce a third picture having the second resolution.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 48, 49)
- - 2. The method of claim 1, further comprising coding a modified subset sequence parameter set with a profile compliant with a three-dimensional video (3DV) profile of a video coding standard, wherein the modified subset sequence parameter set extends a subset sequence parameter set design of multiview video coding (MVC) and provides further extensibility using data coded at the end of the modified subset sequence parameter set.
  - 3. The method of claim 1, wherein the first picture comprises the first texture component, wherein the second picture comprises the second texture component, and wherein decoding the video data of the third coded view comprises:
    - forming first prediction data for the third texture component from one or more portions of the first upsampled picture;
      
      forming second prediction for the third texture component data from one or more portions of the second upsampled picture; and
      
      decoding the third texture component using the first prediction data and the second prediction data.
  - 4. The method of claim 1, wherein the first picture comprises the first depth component, wherein the second picture comprises the second depth component, and wherein decoding the video data of the third coded view comprises:
    - forming first prediction data for the third depth component from one or more portions of the first upsampled picture;
      
      forming second prediction for the third depth component data from one or more portions of the second upsampled picture; and
      
      decoding the third depth component using the first prediction data and the second prediction data.
  - 5. The method of claim 1, wherein decoding the video data of the third coded view comprisespredicting a third block of the third depth component using the second reference block indicated by the motion vector for the first block.
  - 6. The method of claim 1, further comprising receiving information indicative of profiles to which the first coded view, the second coded view, and the third coded view conform.
  - 7. The method of claim 6, wherein the information indicates a number of blocks in the first picture, a number of blocks in the second picture, and a number of blocks in the third picture.
  - 8. The method of claim 6, wherein the information indicates that the third view is predicted relative to the first view and the second view.
  - 9. The method of claim 8, wherein receiving the information comprises receiving sequence parameter sets for the first coded view, the second coded view, and the third coded view.
  - 10. The method of claim 1, wherein the first picture comprises a first downsampled picture, and wherein the second picture comprises a second downsampled picture, the method further comprising:
    - downsampling a first received picture to produce the first downsampled picture, wherein the first received picture has the second resolution;
      
      downsampling a second received picture to produce the second downsampled picture, wherein the second received picture has the second resolution;
      
      encoding the first downsampled picture to produce the video data of the first coded view;
      
      encoding the second downsampled picture to produce the video data of the second coded view; and
      
      encoding a third received picture relative to the first upsampled picture and the second upsampled picture to produce the video data of the third coded view, wherein the third received picture has the second resolution.
  - 11. The method of claim 10, further comprising:
    - encoding a first depth map associated with the first received picture, wherein the first depth map has the first resolution;
      
      forming a first view component comprising a first texture component comprising the encoded first downsampled picture and a first depth component comprising the encoded first depth map;
      
      encoding a second depth map associated with the second received picture, wherein the second depth map has the first resolution;
      
      forming a second view component comprising a second texture component comprising the encoded second downsampled picture and a second depth component comprising the encoded second depth map;
      
      encoding a third depth map associated with the third received picture, wherein the third depth map has the second resolution; and
      
      forming a fourth view component comprising a fourth texture component comprising the encoded third picture and a fourth depth component comprising the encoded third depth map.
  - 12. The method of claim 11, wherein encoding the third received picture comprises:
    - forming first prediction data for the third picture from one or more portions of the first upsampled picture;
      
      forming second prediction for the third picture from one or more portions of the second upsampled picture; and
      
      encoding the third picture using the first prediction data and the second prediction data.
  - 13. The method of claim 11, wherein encoding the third depth map comprises:
    - decoding the encoded first depth map;
      
      upsampling the decoded first depth map to form a first upsampled depth map;
      
      forming first prediction data for the third depth map from one or more portions of the first upsampled depth map;
      
      decoding the encoded second depth map;
      
      upsampling the decoded second depth map to form a second upsampled depth map;
      
      forming second prediction for the third depth map from one or more portions of the second upsampled depth map; and
      
      encoding the third depth map using the first prediction data and the second prediction data.
  - 14. The method of claim 11,wherein encoding the first downsampled picture comprises calculating a motion vector for a first block of the first downsampled picture, andwherein encoding the first depth map comprises predicting a second block of the first depth map using a second reference block indicated by the motion vector relative to the second block, wherein the second block is spatially collocated, within the first depth map, with the first block of the first downsampled picture.
  - 15. The method of claim 10, further comprising producing information indicative of profiles to which the first coded view, the second coded view, and the third coded view conform.
  - 16. The method of claim 15, wherein the information indicates a number of blocks in the first downsampled picture, a number of blocks in the second downsampled picture, and a number of blocks in the third picture.
  - 17. The method of claim 15, wherein the information indicates that the third view is predicted relative to the first view and the second view.
  - 18. The method of claim 15, wherein producing the information comprises producing sequence parameter sets for the first coded view, the second coded view, and the third coded view.
  - 48. The method of claim 1, the method being executable on a wireless communication device, wherein the device comprises:
    - a memory configured to store the video data;
      
      a processor configured to execute instructions to process the video data stored in the memory; and
      
      a receiver configured to receive the video data of the first, second coded view, and third coded views.
  - 49. The method of claim 48, wherein the wireless communication device is a cellular telephone and the video data of the first, second, and third coded views are received by the receiver and modulated according to a cellular communication standard.

19. A device for coding video data, the device comprising:
- a memory configured to store the video data; and
  
  a video coder configured to;
  
  decode a first coded view of the video data, the first coded view comprising a first view component comprising a first texture component having the first resolution and a first depth component having the first resolution, to produce a first picture having a first resolution, wherein, to decode the first coded view, the one or more processors are configured to;
  
  predict a first block of the first texture component using a first reference block indicated by a motion vector for the first block; and
  
  predict a second block of the first depth component using a second reference block indicated by the motion vector for the first block, wherein the second block is spatially collocated, within the first depth component, with the first block of the first texture component,decode a second coded view of the video data, the second coded view comprising a second view component comprising a second texture component having the first resolution and a second depth component having the first resolution to produce a second picture having the first resolution,upsample the first picture to form a first upsampled picture having a second resolution, wherein the second resolution is greater than the first resolution,upsample the second picture to form a second upsampled picture having the second resolution, anddecode a third coded view of the video data, the third coded view comprising a third view component comprising a third texture component having the second resolution and a third depth component having the second resolution relative to the first upsampled picture and the second upsampled picture to produce a third picture having the second resolution.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 50, 51)
- - 20. The device of claim 19, wherein the first picture comprises the first texture component, wherein the second picture comprises the second texture component, and wherein to decode the third coded view of the video data, the video coder is configured to form first prediction data for the third texture component from one or more portions of the first upsampled picture, form second prediction for the third texture component data from one or more portions of the second upsampled picture, and decode the third texture component using the first prediction data and the second prediction data.
  - 21. The device of claim 19, wherein the first picture comprises the first depth component, wherein the second picture comprises the second depth component, and wherein to decode the third coded view of the video data, the video coder is configured to form first prediction data for the third depth component from one or more portions of the first upsampled picture, form second prediction for the third depth component data from one or more portions of the second upsampled picture, and decode the third depth component using the first prediction data and the second prediction data.
  - 22. The device of claim 19, wherein the video coder is configured to receive information indicative of profiles to which the first coded view, the second coded view, and the third coded view conform, information indicative of a number of blocks in the first picture, a number of blocks in the second picture, and a number of blocks in the third picture, and information indicating that the third view is predicted relative to the first view and the second view.
  - 23. The device of claim 19, wherein the video coder comprises a video decoder.
  - 24. The device of claim 19, wherein the video coder comprises a video encoder, wherein the first picture comprises a first downsampled picture, wherein the second picture comprises a second downsampled picture, and wherein the video encoder is further configured to downsample a first received picture to produce the first downsampled picture, wherein the first received picture has the second resolution, downsample a second received picture to produce the second downsampled picture, wherein the second received picture has the second resolution, encode the first downsampled picture to produce the first coded view of the video data, encode the second downsampled picture to produce the second coded view of the video data, and encode a third received picture relative to the first upsampled picture and the second upsampled picture to produce the third coded view of the video data, wherein the third received picture has the second resolution.
  - 25. The device of claim 24, wherein the video encoder is further configured to encode a first depth map associated with the first received picture, wherein the first depth map has the first resolution, form a first view component comprising a first texture component comprising the encoded first downsampled picture and a first depth component comprising the encoded first depth map, encode a second depth map associated with the second received picture, wherein the second depth map has the first resolution, form a second view component comprising a second texture component comprising the encoded second downsampled picture and a second depth component comprising the encoded second depth map, encode a third depth map associated with the third received picture, wherein the third depth map has the second resolution, and form a fourth view component comprising a fourth texture component comprising the encoded third picture and a fourth depth component comprising the encoded third depth map.
  - 26. The device of claim 25, wherein to encode the third received picture, the video encoder is configured to form first prediction data for the third picture from one or more portions of the first upsampled picture, form second prediction for the third picture from one or more portions of the second upsampled picture, and encode the third picture using the first prediction data and the second prediction data.
  - 27. The device of claim 25, wherein to encode the third depth map, the video encoder is configured to decode the encoded first depth map, upsample the decoded first depth map to form a first upsampled depth map, form first prediction data for the third depth map from one or more portions of the first upsampled depth map, decode the encoded second depth map, upsample the decoded second depth map to form a second upsampled depth map, form second prediction for the third depth map from one or more portions of the second upsampled depth map, and encode the third depth map using the first prediction data and the second prediction data.
  - 28. The device of claim 24, wherein the video encoder is further configured to produce information indicative of profiles to which the first coded view, the second coded view, and the third coded view conform, information indicating a number of blocks in the first downsampled picture, a number of blocks in the second downsampled picture, and a number of blocks in the third picture, and information indicating that the third view is predicted relative to the first view and the second view.
  - 29. The device of claim 19, wherein the device comprises at least one of:
    - an integrated circuit;
      
      a microprocessor;
      
      ora wireless communication device that includes the video coder.
  - 50. The device of claim 19, wherein the device is a wireless communication device, the device further comprising a receiver configured to receive the first, second, and third coded views of the video data.
  - 51. The device of claim 50, wherein the wireless communication device is a cellular telephone and the first, second, and third coded views of the video data are received by the receiver and modulated according to a cellular communication standard.

30. A device for coding video data, the device comprising:
- means for decoding video data of a first coded view that comprises a first view component comprising a first texture component having the first resolution and a first depth component having the first resolution to produce a first picture having first resolution, the means for decoding comprising;
  
  means for predicting a first block of the first texture component using a first reference block indicated by a motion vector for the first block; and
  
  means for predicting a second block of the first depth component using a second reference block indicated by the motion vector for the first block, wherein the second block is spatially collocated, within the first depth component, with the first block of the first texture component;
  
  means for decoding video data of a second coded view that comprises a second view component comprising a second texture component having the first resolution and a second depth component having the first resolution to produce a second picture having the first resolution;
  
  means for upsampling the first picture to form a first upsampled picture having a second resolution, wherein the second resolution is greater than the first resolution;
  
  means for upsampling the second picture to form a second upsampled picture having the second resolution; and
  
  means for decoding video data of a third coded view relative to the first upsampled picture and the second upsampled picture to produce a third picture having the second resolution.
- View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38)
- - 31. The device of claim 30, wherein the first picture comprises the first texture component, wherein the second picture comprises the second texture component, and wherein the means for decoding the video data of the third coded view comprises:
    - means for forming first prediction data for the third texture component from one or more portions of the first upsampled picture;
      
      means for forming second prediction for the third texture component data from one or more portions of the second upsampled picture; and
      
      means for decoding the third texture component using the first prediction data and the second prediction data.
  - 32. The device of claim 30, wherein the first picture comprises the first depth component, wherein the second picture comprises the second depth component, and wherein the means for decoding the video data of the third coded view comprises:
    - means for forming first prediction data for the third depth component from one or more portions of the first upsampled picture;
      
      means for forming second prediction for the third depth component data from one or more portions of the second upsampled picture; and
      
      means for decoding the third depth component using the first prediction data and the second prediction data.
  - 33. The device of claim 30, further comprising means for receiving information indicative of profiles to which the first coded view, the second coded view, and the third coded view conform, information indicating a number of blocks in the first picture, a number of blocks in the second picture, and a number of blocks in the third picture, and information indicating that the third view is predicted relative to the first view and the second view.
  - 34. The device of claim 30, wherein the first picture comprises a first downsampled picture, and wherein the second picture comprises a second downsampled picture, further comprising:
    - means for downsampling a first received picture to produce the first downsampled picture, wherein the first received picture has the second resolution;
      
      means for downsampling a second received picture to produce the second downsampled picture, wherein the second received picture has the second resolution;
      
      means for encoding the first downsampled picture to produce the video data of the first coded view;
      
      means for encoding the second downsampled picture to produce the video data of the second coded view; and
      
      means for encoding a third received picture relative to the first upsampled picture and the second upsampled picture to produce the video data of the third coded view, wherein the third received picture has the second resolution.
  - 35. The device of claim 34, further comprising:
    - means for encoding a first depth map associated with the first received picture, wherein the first depth map has the first resolution;
      
      means for forming a first view component comprising a first texture component comprising the encoded first downsampled picture and a first depth component comprising the encoded first depth map;
      
      means for encoding a second depth map associated with the second received picture, wherein the second depth map has the first resolution;
      
      means for forming a second view component comprising a second texture component comprising the encoded second downsampled picture and a second depth component comprising the encoded second depth map;
      
      means for encoding a third depth map associated with the third received picture, wherein the third depth map has the second resolution; and
      
      means for forming a fourth view component comprising a fourth texture component comprising the encoded third picture and a fourth depth component comprising the encoded third depth map.
  - 36. The device of claim 35, wherein the means for encoding the third received picture comprises:
    - means for forming first prediction data for the third picture from one or more portions of the first upsampled picture;
      
      means for forming second prediction for the third picture from one or more portions of the second upsampled picture; and
      
      means for encoding the third picture using the first prediction data and the second prediction data.
  - 37. The device of claim 35, wherein the means for encoding the third depth map comprises:
    - means for decoding the encoded first depth map;
      
      means for upsampling the decoded first depth map to form a first upsampled depth map;
      
      means for forming first prediction data for the third depth map from one or more portions of the first upsampled depth map;
      
      means for decoding the encoded second depth map;
      
      means for upsampling the decoded second depth map to form a second upsampled depth map;
      
      means for forming second prediction for the third depth map from one or more portions of the second upsampled depth map; and
      
      means for encoding the third depth map using the first prediction data and the second prediction data.
  - 38. The device of claim 34, further comprising producing information indicative of profiles to which the first coded view, the second coded view, and the third coded view conform, information indicating a number of blocks in the first downsampled picture, a number of blocks in the second downsampled picture, and a number of blocks in the third picture, and information indicating that the third view is predicted relative to the first view and the second view.

39. A non-transitory computer-readable storage medium having stored thereon instructions that, when executed, cause one or more processors of a video coding device to:
- decode video data of a first coded view that comprises a first view component comprising a first texture component having the first resolution and a first depth component having the first resolution to produce a first picture having the first resolution, the instructions to decode comprising instructions that, when executed, cause the one or more processors to;
  
  predict a first block of the first texture component using a first reference block indicated by a motion vector for the first block; and
  
  predict a second block of the first depth component using a second reference block indicated by the motion vector for the first block, wherein the second block is spatially collocated, within the first depth component, with the first block of the first texture component;
  
  decode video data of a second coded view that comprises a second view component comprising a second texture component having the first resolution and a second depth component having the first resolution to produce a second picture having the first resolution;
  
  upsample the first picture to form a first upsampled picture having a second resolution, wherein the second resolution is greater than the first resolution;
  
  upsample the second picture to form a second upsampled picture having the second resolution; and
  
  decode video data of a third coded view that comprises a third view component comprising a third texture component having the second resolution and a third depth component having the second resolution relative to the first upsampled picture and the second upsampled picture to produce a third picture having the second resolution.
- View Dependent Claims (40, 41, 42, 43, 44, 45, 46, 47)
- - 40. The non-transitory computer-readable storage medium of claim 39, wherein the first picture comprises the first texture component, wherein the second picture comprises the second texture component, and wherein the instructions that cause the one or more processors to decode the video data of the third coded view comprise instructions that cause the one or more processors to:
    - form first prediction data for the third texture component from one or more portions of the first upsampled picture;
      
      form second prediction for the third texture component data from one or more portions of the second upsampled picture; and
      
      decode the third texture component using the first prediction data and the second prediction data.
  - 41. The non-transitory computer-readable storage medium of claim 39, wherein the first picture comprises the first depth component, wherein the second picture comprises the second depth component, and wherein the instructions that cause the one or more processors to decode the video data of the third coded view comprise instructions that cause the one or more processors to:
    - form first prediction data for the third depth component from one or more portions of the first upsampled picture;
      
      form second prediction for the third depth component data from one or more portions of the second upsampled picture; and
      
      decode the third depth component using the first prediction data and the second prediction data.
  - 42. The non-transitory computer-readable storage medium of claim 39, further having stored thereon instructions that cause the one or more processors to receive information indicative of profiles to which the first coded view, the second coded view, and the third coded view conform, information indicating a number of blocks in the first picture, a number of blocks in the second picture, and a number of blocks in the third picture, and information indicating that the third view is predicted relative to the first view and the second view.
  - 43. The non-transitory computer-readable storage medium of claim 39, wherein the first picture comprises a first downsampled picture, and wherein the second picture comprises a second downsampled picture, further comprising instructions that cause the one or more processors to:
    - downsample a first received picture to produce the first downsampled picture, wherein the first received picture has the second resolution;
      
      downsample a second received picture to produce the second downsampled picture, wherein the second received picture has the second resolution;
      
      encode the first downsampled picture to produce the video data of the first coded view;
      
      encode the second downsampled picture to produce the video data of the second coded view; and
      
      encode a third received picture relative to the first upsampled picture and the second upsampled picture to produce the video data of the third coded view, wherein the third received picture has the second resolution.
  - 44. The non-transitory computer-readable storage medium of claim 43, further having stored thereon instructions that cause the one or more processors to:
    - encode a first depth map associated with the first received picture, wherein the first depth map has the first resolution;
      
      form a first view component comprising a first texture component comprising the encoded first downsampled picture and a first depth component comprising the encoded first depth map;
      
      encode a second depth map associated with the second received picture, wherein the second depth map has the first resolution;
      
      form a second view component comprising a second texture component comprising the encoded second downsampled picture and a second depth component comprising the encoded second depth map;
      
      encode a third depth map associated with the third received picture, wherein the third depth map has the second resolution; and
      
      form a fourth view component comprising a fourth texture component comprising the encoded third picture and a fourth depth component comprising the encoded third depth map.
  - 45. The non-transitory computer-readable storage medium of claim 44, wherein the instructions that cause the one or more processors to encode the third received picture comprise instructions that cause the one or more processors to:
    - form first prediction data for the third picture from one or more portions of the first upsampled picture;
      
      form second prediction for the third picture from one or more portions of the second upsampled picture; and
      
      encode the third picture using the first prediction data and the second prediction data.
  - 46. The non-transitory computer-readable storage medium of claim 44, wherein the instructions that cause the one or more processors to encode the third depth map comprise instructions that cause the one or more processors to:
    - decode the encoded first depth map;
      
      upsample the decoded first depth map to form a first upsampled depth map;
      
      form first prediction data for the third depth map from one or more portions of the first upsampled depth map;
      
      decode the encoded second depth map;
      
      upsample the decoded second depth map to form a second upsampled depth map;
      
      form second prediction for the third depth map from one or more portions of the second upsampled depth map; and
      
      encode the third depth map using the first prediction data and the second prediction data.
  - 47. The non-transitory computer-readable storage medium of claim 43, further having stored thereon instructions that cause the one or more processors to produce information indicative of profiles to which the first coded view, the second coded view, and the third coded view conform, information indicating a number of blocks in the first downsampled picture, a number of blocks in the second downsampled picture, and a number of blocks in the third picture, and information indicating that the third view is predicted relative to the first view and the second view.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Chen, Ying, Zhang, Li, Zhang, Rong, Zheng, Yunfei, Karczewicz, Marta
Primary Examiner(s)
Perungavoor, Sath V
Assistant Examiner(s)
Brown, Jr., Howard D

Application Number

US13/542,931
Publication Number

US 20130038686A1
Time in Patent Office

1,348 Days
Field of Search

348/43, 375/240.21
US Class Current

1/1
CPC Class Codes

H04N 19/105   Selection of the reference ...

H04N 19/174   the region being a slice, e...

H04N 19/176   the region being a block, e...

H04N 19/30   using hierarchical techniqu...

H04N 19/46   Embedding additional inform...

H04N 19/513   Processing of motion vectors

H04N 19/53   Multi-resolution motion est...

H04N 19/573   Motion compensation with mu...

H04N 19/597   specially adapted for multi...

H04N 19/61   in combination with predict...

Three-dimensional video with asymmetric spatial resolution

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

51 Claims

Specification

Solutions

Use Cases

Quick Links

Three-dimensional video with asymmetric spatial resolution

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

51 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links