Temporal and spatial scaleable coding for video object planes

US 6,057,884 A
Filed: 06/05/1997
Issued: 05/02/2000
Est. Priority Date: 06/05/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method for scaling an input video sequence comprising video object planes (VOPs) for communication in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, comprising the steps of:

downsampling pixel data of a first particular one of said VOPs of said input video sequence to provide a first base layer VOP having a reduced spatial resolution;

upsampling pixel data of at least a portion of said first base layer VOP to provide a first upsampled VOP in said enhancement layer;

differentially encoding said first upsampled VOP using said first particular one of said VOPs of said input video sequence for communication in said enhancement layer at a temporal position corresponding to said first base layer VOP;

downsampling pixel data of a second particular one of said VOPs of said input video sequence to provide a second base layer VOP having a reduced spatial resolution;

upsampling pixel data of at least a portion of said second base layer VOP to provide a second upsampled VOP in said enhancement layer which corresponds to said first upsampled VOP;

using at least one of said first and second base layer VOPs to predict an intermediate VOP corresponding to said first and second upsampled VOPs; and

encoding said intermediate VOP for communication in said enhancement layer at a temporal position which is intermediate to that of said first and second upsampled VOPs.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Temporal and spatial scaling of video images including video object planes (VOPs) in an input digital video sequence is provided. Coding efficiency is improved by adaptively compressing scaled field mode video. Upsampled VOPs in the enhancement layer are reordered to provide a greater correlation with the input video sequence based on a linear criteria. The resulting residue is coded using a spatial transformation such as the DCT. A motion compensation scheme is used for coding enhancement layer VOPs by scaling motion vectors which have already been determined for the base layer VOPs. A reduced search area whose center is defined by the scaled motion vectors is provided. The motion compensation scheme is suitable for use with scaled frame mode or field mode video. Various processor configurations achieve particular scaleable coding results. Applications of scaleable coding include stereoscopic video, picture-in-picture, preview access channels, and ATM communications.

Citations

36 Claims

1. A method for scaling an input video sequence comprising video object planes (VOPs) for communication in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, comprising the steps of:
- downsampling pixel data of a first particular one of said VOPs of said input video sequence to provide a first base layer VOP having a reduced spatial resolution;
  
  upsampling pixel data of at least a portion of said first base layer VOP to provide a first upsampled VOP in said enhancement layer;
  
  differentially encoding said first upsampled VOP using said first particular one of said VOPs of said input video sequence for communication in said enhancement layer at a temporal position corresponding to said first base layer VOP;
  
  downsampling pixel data of a second particular one of said VOPs of said input video sequence to provide a second base layer VOP having a reduced spatial resolution;
  
  upsampling pixel data of at least a portion of said second base layer VOP to provide a second upsampled VOP in said enhancement layer which corresponds to said first upsampled VOP;
  
  using at least one of said first and second base layer VOPs to predict an intermediate VOP corresponding to said first and second upsampled VOPs; and
  
  encoding said intermediate VOP for communication in said enhancement layer at a temporal position which is intermediate to that of said first and second upsampled VOPs.
- View Dependent Claims (2)
- - 2. The method of claim 1, wherein:
    - said enhancement layer has a higher temporal resolution than said base layer; and
      
      said base and enhancement layer are adapted to provide at least one of;
      
      (a) a picture-in-picture (PIP) capability wherein a PIP image is carried in said base layer, and(b) a preview access channel capability wherein a preview access image is carried in said base layer.

3. A method for scaling an input video sequence comprising video object planes (VOPs) for communication in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, comprising the steps of:
- providing a first particular one of said VOPs of said input video sequence for communication in said base layer as a first base layer VOP;
  
  downsampling pixel data of at least a portion of said first base layer VOP for communication in said enhancement layer as a first downsampled VOP at a temporal position corresponding to said first base layer VOP;
  
  downsampling corresponding pixel data of said first particular one of said VOPs to provide a comparison VOP;
  
  differentially encoding said first downsampled VOP using said comparison VOP;
  
  differentially encoding said first base layer VOP using said first particular one of said VOPs by;
  
  determining a residue according to a difference between pixel data of said first base layer VOP and pixel data of said first particular one of said VOPs; and
  
  spatially transforming said residue to provide transform coefficients;
  
  wherein said VOPs in said input video sequence are field mode VOPs, and said first base layer VOP is differentially encoded by reordering lines of said pixel data of said first base layer VOP in a field mode prior to said determining step if said lines of pixel data meet a reordering criteria.
- View Dependent Claims (4)
- - 4. The method of claim 3, wherein:
    - said lines of pixel data of said first base layer VOP meet said reordering criteria when a sum of differences of luminance values of opposite-field lines is greater than a sum of differences of luminance data of same-field lines and a bias term.

5. A method for coding a bi-directionally predicted video object plane (B-VOP), comprising the steps of:
- scaling an input video sequence comprising video object planes (VOPs) for communication in a corresponding base layer and enhancement layer;
  
  providing first and second base layer VOPs in said base layer which correspond to said input video sequence VOPs;
  
  said second base layer VOP being predicted from said first base layer VOP according to a motion vector MV_p ;
  
  providing said B-VOP in said enhancement layer at a temporal position which is intermediate to that of said first and second base layer VOPs; and
  
  encoding said B-VOP using at least one of;
  
  (a) a forward motion vector MV_f and(b) a backward motion vector MV_B, obtained by scaling said motion vector MV_p.
- View Dependent Claims (6, 7)
- - 6. The method of claim 5, wherein:
    - a temporal distance TR_p separates said first and second base layer VOPs;
      
      a temporal distance TR_B separates said first base layer VOP and said B-VOP;
      
      m/n is a ratio of the spatial resolution of the first and second base layer VOPs to the spatial resolution of the B-VOP; and
      
      at least one of;
      
      (a) said forward motion vector MV_f is determined according to the relationship MV_f =(m/n)·
      
      TR_B ·
      
      MV_p /TR_p ; and
      
      (b) said backward motion vector MV_b is determined according to the relationship MV_b =(m/n)·
      
      (TR_B -TR_p)·
      
      MV_p /TR_p.
  - 7. The method of claim 5, comprising the further step of:
    - encoding said B-VOP using at least one of;
      
      (a) a search region of said first base layer VOP whose center is determined according to said forward motion vector MV_f ; and
      
      (b) a search region of said second base layer VOP whose center is determined according to said backward motion vector MV_B.

8. A method for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein:
- pixel data of a first particular one of said VOPs of said input video sequence is downsampled and carried as a first base layer VOP having a reduced spatial resolution;
  
  pixel data of at least a portion of said first base layer VOP is upsampled and carried as a first upsampled VOP in said enhancement layer at a temporal position corresponding to said first base layer VOP; and
  
  said first upsampled VOP is differentially encoded using said first particular one of said VOPs of said input video sequence;
  
  said method comprising the steps of;
  
  upsampling said pixel data of said first base layer VOP to restore said associated spatial resolution; and
  
  processing said first upsampled VOP and said first base layer VOP with said restored associated spatial resolution to provide an output video signal with said associated spatial resolution;
  
  wherein;
  
  a second particular one of said VOPs of said input video sequence is downsampled to provide a second base layer VOP having a reduced spatial resolution;
  
  pixel data of at least a portion of said second base layer VOP is upsampled to provide a second upsampled VOP in said enhancement layer which corresponds to said first upsampled VOP;
  
  at least one of said first and second base layer VOPs is used to predict an intermediate VOP corresponding to said first and second upsampled VOPs; and
  
  said intermediate VOP is encoded for communication in said enhancement layer at a temporal position which is intermediate to that of said first and second upsampled VOPs.
- View Dependent Claims (9)
- - 9. The method of claim 8, wherein:
    - said enhancement layer has a higher temporal resolution than said base layer; and
      
      said base and enhancement layer are adapted to provide at least one of;
      
      (a) a picture-in-picture (PIP) capability wherein a PIP image is carried in said base layer, and(b) a preview access channel capability wherein a preview access image is carried in said base layer.

10. A method for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein:
- a first particular one of said VOPs of said input video sequence is provided in said base layer as a first base layer VOP;
  
  pixel data of at least a portion of said first base layer VOP is downsampled and carried in said enhancement layer as a first downsampled VOP at a temporal position corresponding to said first base layer VOP;
  
  corresponding pixel data of said first particular one of said VOPs is downsampled to provide a comparison VOP; and
  
  said first downsampled VOP is differentially encoded using said comparison VOP;
  
  said method comprising the steps of;
  
  upsampling said pixel data of said first downsampled VOP to restore said associated spatial resolution; and
  
  processing said first enhancement layer VOP with said restored associated spatial resolution and said first base layer VOP to provide an output video signal with said associated spatial resolution;
  
  wherein;
  
  said first base layer VOP is differentially encoded using said first particular one of said VOPs by determining a residue according to a difference between pixel data of said first base layer VOP and pixel data of said first particular one of said VOPs, and spatially transforming said residue to provide transform coefficients; and
  
  said VOPs in said input video sequence are field mode VOPs, and said first base layer VOP is differentially encoded by reordering lines of said pixel data of said first base layer VOP in a field mode prior to determining said residue if said lines of pixel data meet a reordering criteria.
- View Dependent Claims (11)
- - 11. The method of claim 10, wherein:
    - said lines of pixel data of said first base layer VOP meet said reordering criteria when a sum of differences of luminance values of opposite-field lines is greater than a sum of differences of luminance data of same-field lines and a bias term.

12. A method for recovering an input video sequence comprising video object planes (VOPs) which was scaled and communicated in a corresponding base layer and enhancement layer in a data stream, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein:
- first and second base layer VOPs are provided in said base layer which correspond to said input video sequence VOPs;
  
  said second base layer VOP is predicted from said first base layer VOP according to a motion vector MV_B ;
  
  a bi-directionally predicted video object plane (B-VOP) is provided in said enhancement layer at a temporal position which is intermediate to that of said first and second base layer VOPs; and
  
  said B-VOP is encoded using a forward motion vector MV_f and a backward motion vector MV_p which are obtained by scaling said motion vector MV_p ;
  
  said method comprising the steps of;
  
  recovering said forward motion vector MV_f and said backward motion vector MV_B from said data stream; and
  
  decoding said B-VOP using said forward motion vector MV_f and said backward motion vector MV_B.
- View Dependent Claims (13, 14)
- - 13. The method of claim 12, wherein:
    - a temporal distance TR_p separates said first and second base layer VOPs;
      
      a temporal distance TR_B separates said first base layer VOP and said B-VOP;
      
      m/n is a ratio of the spatial resolution of the first and second base layer VOPs to the spatial resolution of the B-VOP; and
      
      at least one of;
      
      (a) said forward motion vector MV_f is determined according to the relationship MV_f =(m/n)·
      
      TR_B ·
      
      MV_p /TR_p ; and
      
      (b) said backward motion vector MV_b is determined according to the relationship MV_b =(m/n)·
      
      (TR_B -TR_p)·
      
      MV_p /TR_p.
  - 14. The method of claim 12, wherein:
    - said B-VOP is encoded using at least one of;
      
      (a) a search region of said first base layer VOP whose center is determined according to said forward motion vector MV_f ; and
      
      (b) a search region of said second base layer VOP whose center is determined according to said backward motion vector MV_B.

15. A decoder apparatus for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein:
- pixel data of a first particular one of said VOPs of said input video sequence is downsampled and carried as a first base layer VOP having a reduced spatial resolution;
  
  pixel data of at least a portion of said first base layer VOP is upsampled and carried as a first upsampled VOP in said enhancement layer at a temporal position corresponding to said first base layer VOP; and
  
  said first upsampled VOP is differentially encoded using said first particular one of said VOPs of said input video sequence;
  
  said apparatus comprising;
  
  means for upsampling said pixel data of said first base layer VOP to restore said associated spatial resolution; and
  
  means for processing said first upsampled VOP and said first base layer VOP with said restored associated spatial resolution to provide an output video signal with said associated spatial resolution;
  
  wherein;
  
  said VOPs in said input video sequence are field mode VOPs; and
  
  said first upsampled VOP is differentially encoded by reordering lines of said pixel data of said first upsampled VOP in a field mode if said lines of pixel data meet a reordering criteria, then determining a residue according to a difference between pixel data of said first unsampled VOP and pixel data of said first particular one of said VOPs of said input video sequence, and spatially transforming said residue to provide transform coefficients.
- View Dependent Claims (16)
- - 16. The apparatus of claim 15, wherein:
    - said lines of pixel data of said first upsampled VOP meet said reordering criteria when a sum of differences of luminance values of opposite-field lines is greater than a sum of differences of luminance data of same-field lines and a bias term.

17. A decoder apparatus for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein:
- a first particular one of said VOPs of said input video sequence is provided in said base layer as a first base layer VOP;
  
  pixel data of at least a portion of said first base layer VOP is downsampled and carried in said enhancement layer as a first downsampled VOP at a temporal position corresponding to said first base layer VOP;
  
  corresponding pixel data of said first particular one of said VOPs is downsampled to provide a comparison VOP; and
  
  said first downsampled VOP is differentially encoded using said comparison VOP;
  
  said apparatus comprising;
  
  means for upsampling said pixel data of said first downsampled VOP to restore said associated spatial resolution; and
  
  means for processing said first enhancement layer VOP with said restored spatial resolution and said first base layer VOP to provide an output video signal with said associated spatial resolution;
  
  wherein;
  
  said first downsampled VOP is differentially encoded by determining a residue according to a difference between pixel data of said first downsampled VOP and pixel data of said first particular one of said VOPs of said input video sequence, and spatially transforming said residue to provide transform coefficients; and
  
  said VOPs in said input video sequence are field mode VOPs, and said first base layer VOP is differentially encoded by reordering lines of said pixel data of said first base layer VOP in a field mode prior to determining said residue if said lines of pixel data meet a reordering criteria.
- View Dependent Claims (18)
- - 18. The apparatus of claim 17, wherein:
    - said lines of pixel data of said first base layer VOP meet said reordering criteria when a sum of differences of luminance values of opposite-field lines is greater than a sum of differences of luminance data of same-field lines and a bias term.

19. A decoder apparatus for recovering an input video sequence comprising video object planes (VOPs) which was scaled and communicated in a corresponding base layer and enhancement layer in a data stream, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein:
- first and second base layer VOPs which correspond to said input video sequence VOPs are provided in said base layer;
  
  said second base layer VOP is predicted from said first base layer VOP according to a motion vector MV_p ;
  
  a bi-directionally predicted video object plane (B-VOP) is provided in said enhancement layer at a temporal position which is intermediate to that of said first and second base layer VOPs; and
  
  said B-VOP is encoded using a forward motion vector MV_f and a backward motion vector MV_B which are obtained by scaling said motion vector MV_p ;
  
  said apparatus comprising;
  
  means for recovering said forward motion vector MV_f and said backward motion vector MV_B from said data stream; and
  
  means for decoding said B-VOP using said forward motion vector MV_f and said backward motion vector MV_B.
- View Dependent Claims (20, 21)
- - 20. The apparatus of claim 19, wherein:
    - a temporal distance TR_p separates said first and second base layer VOPs;
      
      a temporal distance TR_B separates said first base layer VOP and said B-VOP;
      
      m/n is a ratio of the spatial resolution of the first and second base layer VOPs to the spatial resolution of the B-VOP; and
      
      at least one of;
      
      (a) said forward motion vector MV_f is determined according to the relationship MV_f =(m/n)·
      
      TR_B ·
      
      MV_p /TR_p ; and
      
      (b) said backward motion vector MV_b is determined according to the relationship MV_b =(m/n)·
      
      (TR_B -TR_p)·
      
      MV_p /TR_p.
  - 21. The apparatus of claim 19, wherein:
    - said B-VOP is encoded using at least one of;
      
      (a) a search region of said first base layer VOP whose center is determined according to said forward motion vector MV_f ; and
      
      (b) a search region of said second base layer VOP whose center is determined according to said backward motion vector MV_B.

22. A method for scaling an input video sequence comprising video object planes (VOPs) for communication in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, comprising the steps of:
- downsampling pixel data of a first particular one of said VOPs of said input video sequence to provide a first base layer VOP having a reduced spatial resolution;
  
  upsampling pixel data of at least a portion of said first base layer VOP to provide a first upsampled VOP in said enhancement layer;
  
  differentially encoding said first upsampled VOP using said first particular one of said VOPs of said input video sequence for communication in said enhancement layer at a temporal position corresponding to said first base layer VOP;
  
  wherein said VOPs in said input video sequence are field mode VOPs, and said differentially encoding step comprises the further steps of;
  
  reordering lines of said pixel data of said first upsampled VOP in a field mode if said lines of pixel data meet a reordering criteria;
  
  thendetermining a residue according to a difference between pixel data of said first upsampled VOP and pixel data of said first particular one of said VOPs of said input video sequence; and
  
  spatially transforming said residue to provide transform coefficients.
- View Dependent Claims (23)
- - 23. The method of claim 22, wherein:
    - said lines of pixel data of said first upsampled VOP meet said reordering criteria when a sum of differences of luminance values of opposite-field lines is greater than a sum of differences of luminance data of same-field lines and a bias term.

24. A method for scaling an input video sequence comprising video object planes (VOPs) for communication in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, comprising the steps of:
- downsampling pixel data of a first particular one of said VOPs of said input video sequence to provide a first base layer VOP having a reduced spatial resolution;
  
  upsampling pixel data of at least a portion of said first base layer VOP to provide a first upsampled VOP in said enhancement layer; and
  
  differentially encoding said first upsampled VOP using said first particular one of said VOPs of said input video sequence for communication in said enhancement layer at a temporal position corresponding to said first base layer VOP;
  
  wherein;
  
  said base layer is adapted to carry higher priority, lower bit rate data, and said enhancement layer is adapted to carry lower priority, higher bit rate data.

25. A method for scaling an input video sequence comprising video object planes (VOPs) for communication in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, comprising the steps of:
- providing a first particular one of said VOPs of said input video sequence for communication in said base layer as a first base layer VOP;
  
  downsampling pixel data of at least a portion of said first base layer VOP for communication in said enhancement layer as a first downsampled VOP at a temporal position corresponding to said first base layer VOP;
  
  downsampling corresponding pixel data of said first particular one of said VOPs to provide a comparison VOP;
  
  differentially encoding said first downsampled VOP using said comparison VOP;
  
  providing a second particular one of said VOPs of said input video sequence for communication in said base layer as a second base layer VOP;
  
  downsampling pixel data of at least a portion of said second base layer VOP for communication in said enhancement layer as a second downsampled VOP at a temporal position corresponding to said second base layer VOP;
  
  downsampling corresponding pixel data of said second particular one of said VOPs to provide a comparison VOP;
  
  differentially encoding said second downsampled VOP using said comparison VOP;
  
  using at least one of said first and second base layer VOPs to predict an intermediate VOP corresponding to said first and second downsampled VOPs; and
  
  encoding said intermediate VOP for communication in said enhancement layer at a temporal position which is intermediate to that of said first and second downsampled VOPs.

26. A method for scaling an input video sequence comprising video object planes (VOPs) for communication in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, comprising the steps of:
- providing a first particular one of said VOPs of said input video sequence for communication in said base layer as a first base layer VOP;
  
  downsampling pixel data of at least a portion of said first base layer VOP for communication in said enhancement layer as a first downsampled VOP at a temporal position corresponding to said first base layer VOP;
  
  downsampling corresponding pixel data of said first particular one of said VOPs to provide a comparison VOP; and
  
  differentially encoding said first downsampled VOP using said comparison VOP;
  
  wherein;
  
  the base and enhancement layers are adapted to provide a stereoscopic video capability in which image data in the enhancement layer has a lower spatial resolution than image data in the base layer.

27. A method for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein:
- pixel data of a first particular one of said VOPs of said input video sequence is downsampled and carried as a first base layer VOP having a reduced spatial resolution;
  
  pixel data of at least a portion of said first base layer VOP is upsampled and carried as a first upsampled VOP in said enhancement layer at a temporal position corresponding to said first base layer VOP; and
  
  said first upsampled VOP is differentially encoded using said first particular one of said VOPs of said input video sequence;
  
  said method comprising the steps of;
  
  upsampling said pixel data of said first base layer VOP to restore said associated spatial resolution; and
  
  processing said first upsampled VOP and said first base layer VOP with said restored associated spatial resolution to provide an output video signal with said associated spatial resolution;
  
  wherein;
  
  said VOPs in said input video sequence are field mode VOPs; and
  
  said first upsampled VOP is differentially encoded by reordering lines of said pixel data of said first upsampled VOP in a field mode if said lines of pixel data meet a reordering criteria, then determining a residue according to a difference between pixel data of said first upsampled VOP and pixel data of said first particular one of said VOPs of said input video sequence, and spatially transforming said residue to provide transform coefficients.
- View Dependent Claims (28)
- - 28. The method of claim 27, wherein:
    - said lines of pixel data of said first upsampled VOP meet said reordering criteria when a sum of differences of luminance values of opposite-field lines is greater than a sum of differences of luminance data of same-field lines and a bias term.

29. A method for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein:
- pixel data of a first particular one of said VOPs of said input video sequence is downsampled and carried as a first base layer VOP having a reduced spatial resolution;
  
  pixel data of at least a portion of said first base layer VOP is upsampled and carried as a first upsampled VOP in said enhancement layer at a temporal position corresponding to said first base layer VOP; and
  
  said first upsampled VOP is differentially encoded using said first particular one of said VOPs of said input video sequence;
  
  said method comprising the steps of;
  
  upsampling said pixel data of said first base layer VOP to restore said associated spatial resolution; and
  
  processing said first upsampled VOP and said first base layer VOP with said restored associated spatial resolution to provide an output video signal with said associated spatial resolution;
  
  wherein;
  
  said base layer is adapted to carry higher priority, lower bit rate data, and said enhancement layer is adapted to carry lower priority, higher bit rate data.

30. A method for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein:
- a first particular one of said VOPs of said input video sequence is provided in said base layer as a first base layer VOP;
  
  pixel data of at least a portion of said first base layer VOP is downsampled and carried in said enhancement layer as a first downsampled VOP at a temporal position corresponding to said first base layer VOP;
  
  corresponding pixel data of said first particular one of said VOPs is downsampled to provide a comparison VOP; and
  
  said first downsampled VOP is differentially encoded using said comparison VOP;
  
  said method comprising the steps of;
  
  upsampling said pixel data of said first downsampled VOP to restore said associated spatial resolution; and
  
  processing said first enhancement layer VOP with said restored associated spatial resolution and said first base layer VOP to provide an output video signal with said associated spatial resolution;
  
  wherein;
  
  a second particular one of said VOPs of said input video sequence is provided in said base layer as a second base layer VOP;
  
  pixel data of at least a portion of said second base layer VOP is downsampled and carried in said enhancement layer as a second downsampled VOP at a temporal position corresponding to said second base layer VOP;
  
  corresponding pixel data of said second particular one of said VOPs is downsampled to provide a comparison VOP;
  
  said second downsampled VOP is differentially encoded using said comparison VOP;
  
  at least one of said first and second base layer VOPs is used to predict an intermediate VOP corresponding to said first and second downsampled VOPs; and
  
  said intermediate VOP is encoded for communication in said enhancement layer at a temporal position which is intermediate to that of said first and second downsampled VOPs.

31. A method for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein:
- a first particular one of said VOPs of said input video sequence is provided in said base layer as a first base layer VOP;
  
  pixel data of at least a portion of said first base layer VOP is downsampled and carried in said enhancement layer as a first downsampled VOP at a temporal position corresponding to said first base layer VOP;
  
  corresponding pixel data of said first particular one of said VOPs is downsampled to provide a comparison VOP; and
  
  said first downsampled VOP is differentially encoded using said comparison VOP;
  
  said method comprising the steps of;
  
  upsampling said pixel data of said first downsampled VOP to restore said associated spatial resolution; and
  
  processing said first enhancement layer VOP with said restored associated spatial resolution and said first base layer VOP to provide an output video signal with said associated spatial resolution;
  
  wherein;
  
  said base and enhancement layer are adapted to provide a stereoscopic video capability in which image data in said enhancement layer has a lower spatial resolution than image data in said base layer.

32. A decoder apparatus for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein:
- pixel data of a first particular one of said VOPs of said input video sequence is downsampled and carried as a first base layer VOP having a reduced spatial resolution;
  
  pixel data of at least a portion of said first base layer VOP is upsampled and carried as a first upsampled VOP in said enhancement layer at a temporal position corresponding to said first base layer VOP; and
  
  said first upsampled VOP is differentially encoded using said first particular one of said VOPs of said input video sequence;
  
  said apparatus comprising;
  
  means for upsampling said pixel data of said first base layer VOP to restore said associated spatial resolution; and
  
  means for processing said first upsampled VOP and said first base layer VOP with said restored associated spatial resolution to provide an output video signal with said associated spatial resolution;
  
  wherein;
  
  a second particular one of said VOPs of said input video sequence is downsampled to provide a second base layer VOP having a reduced spatial resolution;
  
  pixel data of at least a portion of said second base layer VOP is upsampled to provide a second upsampled VOP in said enhancement layer which corresponds to said first upsampled VOP;
  
  at least one of said first and second base layer VOPs is used to predict an intermediate VOP corresponding to said first and second upsampled VOPs; and
  
  said intermediate VOP is encoded for communication in said enhancement layer at a temporal position which is intermediate to that of said first and second upsampled VOPs.
- View Dependent Claims (33)
- - 33. The apparatus of claim 32, wherein:
    - said enhancement layer has a higher temporal resolution than said base layer; and
      
      said base and enhancement layers are adapted to provide at least one of;
      
      (a) a picture-in-picture (PIP) capability wherein a PIP image is carried in said base layer, and(b) a preview access channel capability wherein a preview access image is carried in said base layer.

34. A decoder apparatus for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein:
- pixel data of a first particular one of said VOPs of said input video sequence is downsampled and carried as a first base layer VOP having a reduced spatial resolution;
  
  pixel data of at least a portion of said first base layer VOP is upsampled and carried as a first upsampled VOP in said enhancement layer at a temporal position corresponding to said first base layer VOP; and
  
  said first upsampled VOP is differentially encoded using said first particular one of said VOPs of said input video sequence;
  
  said apparatus comprising;
  
  means for upsampling said pixel data of said first base layer VOP to restore said associated spatial resolution; and
  
  means for processing said first upsampled VOP and said first base layer VOP with said restored associated spatial resolution to provide an output video signal with said associated spatial resolution;
  
  wherein;
  
  said base layer is adapted to carry higher priority, lower bit rate data, and said enhancement layer is adapted to carry lower priority, higher bit rate data.

35. A decoder apparatus for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein:
- a first particular one of said VOPs of said input video sequence is provided in said base layer as a first base layer VOP;
  
  pixel data of at least a portion of said first base layer VOP is downsampled and carried in said enhancement layer as a first downsampled VOP at a temporal position corresponding to said first base layer VOP;
  
  corresponding pixel data of said first particular one of said VOPs is downsampled to provide a comparison VOP; and
  
  said first downsampled VOP is differentially encoded using said comparison VOP;
  
  said apparatus comprising;
  
  means for upsampling said pixel data of said first downsampled VOP to restore said associated spatial resolution; and
  
  means for processing said first enhancement layer VOP with said restored spatial resolution and said first base layer VOP to provide an output video signal with said associated spatial resolution;
  
  wherein;
  
  a second particular one of said VOPs of said input video sequence is provided for communication in said base layer as a second base layer VOP;
  
  pixel data of at least a portion of said second base layer VOP is downsampled to provide a second downsampled VOP in said enhancement layer which corresponds to said first upsampled VOP;
  
  at least one of said first and second base layer VOPs is used to predict an intermediate VOP corresponding to said first and second downsampled VOPs; and
  
  said intermediate VOP is encoded for communication in said enhancement layer at a temporal position which is intermediate to that of said first and second downsampled VOPs.

36. A decoder apparatus for recovering an input video sequence comprising video object planes (VOPs) which were scaled and communicated in a corresponding base layer and enhancement layer, said VOPs in said input video sequence having an associated spatial resolution and temporal resolution, wherein:
- a first particular one of said VOPs of said input video sequence is provided in said base layer as a first base layer VOP;
  
  pixel data of at least a portion of said first base layer VOP is downsampled and carried in said enhancement layer as a first downsampled VOP at a temporal position corresponding to said first base layer VOP;
  
  corresponding pixel data of said first particular one of said VOPs is downsampled to provide a comparison VOP; and
  
  said first downsampled VOP is differentially encoded using said comparison VOP;
  
  said apparatus comprising;
  
  means for upsampling said pixel data of said first downsampled VOP to restore said associated spatial resolution; and
  
  means for processing said first enhancement layer VOP with said restored spatial resolution and said first base layer VOP to provide an output video signal with said associated spatial resolution;
  
  wherein;
  
  said base and enhancement layer are adapted to provide a stereoscopic video capability in which image data in said enhancement layer has a lower spatial resolution than image data in said base layer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google Technology Holdings LLC (Alphabet Inc.)
Original Assignee
General Instrument Corporation (CommScope Holding Company, Inc.)
Inventors
Rajan, Ganesh, Narasimhan, Mandayam, Chen, Xuemin, Luthra, Ajay
Primary Examiner(s)
Le, Vu

Application Number

US08/869,493
Time in Patent Office

1,062 Days
Field of Search

348/397-400, 348/402, 348/407-408, 348/412-413, 348/415-416, 348/438, 348/699-700, 382/242, 382/243, 382/236, 382/238, 382/244
US Class Current

375/240.16
CPC Class Codes

G06T 3/40   Scaling of whole images or ...

H04N 13/10   Processing, recording or tr...

H04N 13/15   for colour aspects of image...

H04N 13/161   Encoding, multiplexing or d...

H04N 13/167   Synchronising or controllin...

H04N 13/189   Recording image signals; Re...

H04N 13/194   Transmission of image signals

H04N 13/286   having separate monoscopic ...

H04N 19/29   involving scalability at th...

H04N 19/30   using hierarchical techniqu...

H04N 19/31   in the temporal domain

H04N 19/33   in the spatial domain

H04N 19/577   Motion compensation with bi...

H04N 19/597   specially adapted for multi...

H04N 2013/0085   Motion estimation from ster...

H04N 2013/0092   Image segmentation from ste...

H04N 7/52   Systems for transmission of...

Temporal and spatial scaleable coding for video object planes

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

36 Claims

Specification

Solutions

Use Cases

Quick Links

Temporal and spatial scaleable coding for video object planes

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

36 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links