Stitching of video for continuous presence multipoint video conferencing

US 20050008240A1
Filed: 04/30/2004
Published: 01/13/2005
Est. Priority Date: 05/02/2003
Status: Abandoned Application

First Claim

Patent Images

1. A method of generating a stitched video frame in a sequence of stitched video frames;

decoding a plurality of video bitstreams to produce a plurality of pixel-domain pictures;

spatially composing said plurality of pixel domain pictures to create a single ideal stitched video frame;

storing prediction information from said plurality of decoded video bitstreams;

forming a stitched predictor by performing temporal prediction for inter-coded portions of the stitched video frame based on the stored prediction information and a retained reference frame in said sequence of stitched video frames and performing spatial prediction using retained intra-prediction information on the stitched video frame;

forming a stitched raw residual by subtracting the stitched predictor for a portion of the stitched video frame from a corresponding portion of the ideal stitched video frame;

forward transforming and quantizing the stitched raw residual; and

entropy encoding the forward transformed and quantized stitched raw residual.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A drift-free hybrid method of performing video stitching is provided. The method includes decoding a plurality of video bitstreams and storing prediction information. The decoded bitstreams form video images, spatially composed into a combined image. The image comprises frames of ideal stitched video sequence. The method uses prediction information in conjunction with previously generated frames to predict pixel blocks in the next frame. A stitched predicted block in the next frame is subtracted from a corresponding block in a corresponding frame to create a stitched raw residual block. The raw residual block is forward transformed, quantized, entropy encoded and added to the stitched video bitstream along with the prediction information. Also, the stitched raw residual block is inverse transformed and dequantized to create a stitched decoded residual block. The residual block is added to the predicted block to generate the stitched reconstructed block in the next frame of the sequence.

281 Citations

56 Claims

1. A method of generating a stitched video frame in a sequence of stitched video frames;
- decoding a plurality of video bitstreams to produce a plurality of pixel-domain pictures;
  
  spatially composing said plurality of pixel domain pictures to create a single ideal stitched video frame;
  
  storing prediction information from said plurality of decoded video bitstreams;
  
  forming a stitched predictor by performing temporal prediction for inter-coded portions of the stitched video frame based on the stored prediction information and a retained reference frame in said sequence of stitched video frames and performing spatial prediction using retained intra-prediction information on the stitched video frame;
  
  forming a stitched raw residual by subtracting the stitched predictor for a portion of the stitched video frame from a corresponding portion of the ideal stitched video frame;
  
  forward transforming and quantizing the stitched raw residual; and
  
  entropy encoding the forward transformed and quantized stitched raw residual.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 further including the steps of transmitting an output stitched video bitstream representing the stitched video frame to a decoder.
  - 3. The method of claim 1 further including the steps of de-quantizing and reverse transforming the forward transformed and quantized stitched raw residual block to form a stitched decoded residual block;
    - and adding the stitched decoded residual block to the stitched predictor to form a reconstructed block of pixels in the stitched video frame in the sequence of stitched video frames.
  - 4. The method of claim 1 wherein said plurality of decoded video bitstreams comprise QCWF video frames, and said ideal stitched video frame comprises a CIF video frame.
  - 5. The method of claim 1 wherein said plurality of decoded video bitstreams comprise CIF video frames, and said ideal stitched video frame comprises a 4CIF video frame.
  - 6. The method of claim 2 wherein said stitched video bitstream conforms to a H.263 standard.
  - 7. The method of claim 2 wherein said stitched video bitstream conforms to a H.264 standard.
  - 8. The method of claim 1 wherein the said plurality of video bitstreams and the said stitched bitstream conform to a mixed set of video coding standards.
  - 9. The method of claim 8 wherein the mixed set of video coding standards is a subset or the whole of {H.261, H.263, H.264}.

10. A hybrid drift free method of performing video stitching comprising:
- spatially an ideal stitched video sequence in the pixel domain;
  
  predicting elements of a current frame in a stitched video sequence; and
  
  generating the current frame in the stitched video sequence based on the predicted elements of the current frame and the differences between the predicted elements of the current frame and corresponding elements of a corresponding frame of the ideal stitched video sequence.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 11. The method of claim 10 further comprising the steps of:
    - encoding a forward transformed and quantized stitched raw residual element representing the difference between a predicted element of the current frame and a corresponding element of a corresponding frame of the ideal stitched video sequence; and
      
      transmitting the encoded stitched forward transformed and quantized stitched raw residual element as part of a stitched video sequence.
  - 12. The method of claim 10 wherein the step of generating the current frame in the stitched video sequence comprises decoding the stitched raw residual element to form a stitched decoded residual element and adding the stitched decoded residual element to the predicted element of the current frame.
  - 13. The method of claim 10 wherein the step of predicting elements of the current frame in the video sequence comprises:
    - decoding a plurality of input video bitstreams;
      
      storing prediction information from said plurality of decoded video bitstreams;
      
      using the stored prediction information along with pixel data from a previous frame in the stitched video sequence to predict elements of the current frame of the stitched video sequence.
  - 14. The method of claim 13 wherein said plurality of input video bitstreams comprises compressed video bitstreams conforming to ITU-T H.263 standard.
  - 15. The method of claim 14 wherein said plurality of input video bitstreams correspond to QCIF resolution.
  - 16. The method of claim 14 wherein said plurality of input video bitstreams correspond to CIF resolution.
  - 17. The method of claim 13 wherein said plurality of input video bitstreams comprise compressed video bitstreams conforming to ITU-T H.264 standard.
  - 18. The method of claim 17 wherein said plurality of input video bitstreams correspond to QCIF resolution.
  - 19. The method of claim 17 wherein said plurality of input video bitstreams correspond to CIF resolution.

20. A method of generating a stitched video sequence comprising:
- composing a first stitched video sequence in the pixel domain;
  
  generating a stitched predictor for predicting the pixel data comprising an array of pixels in a current frame of a second stitched video sequence;
  
  subtracting the stitched predictor from a corresponding array of pixels in a corresponding frame of the first stitched video sequence to form a stitched raw residual array of pixels;
  
  encoding the stitched raw residual array of pixels;
  
  decoding the encoded stitched raw residual array of pixels to form a stitched decoded residual array of pixels; and
  
  adding the stitched residual array of pixels to the stitched predictor.
- View Dependent Claims (21, 22, 23, 24, 25)
- - 21. The method of claim 20 wherein said step of composing a first stitched video sequence further comprises the steps of:
    - decoding a plurality of input video bitstreams into a plurality of video sequences, each sequence comprising a plurality of decoded video frames; and
      
      sequentially spatially composing combined image frames from sequential frames of the decoded input video sequences.
  - 22. The method of claim 21 wherein the frames of the decoded video sequences comprise QCIF images and said combined image frames comprise CIF images.
  - 23. The method of claim 21 wherein the frames of the decoded video sequences comprise CIF images and said combined image frames comprise 4CIF images.
  - 24. The method of claim 21 is wherein input video bitstreams conform to one of the ITU-T H.261;
    - ITU-T H.263;
      
      or ITU-T H.264 standards.
  - 25. The method of claim 21 further comprising the step of storing prediction information from said plurality of decoded input video bitstreams and using said prediction information to generate said stitched predictor.

26. A method of decoding a pixel block in a frame of a stitched video sequence;
- retaining a previous frame in said stitched video sequence;
  
  generating a stitched residual block by entropy decoding, inverse transforming and dequantizing a bitstream containing entropy coded, forward transformed and quantized stitched raw residual block formed by subtracting a first stitched predictor from a frame in an ideal stitched video sequence;
  
  generating a second stitched predictor for the pixel block in the frame to be decoded in the video sequence in substantially the same manner that said first stitched predictor was generated; and
  
  adding the decoded stitched residual block to the stitched predictor.
- View Dependent Claims (27)
- - 27. The method of claim 26 wherein said bitstream conforms to one of the ITU-T;
    - H.261;
      
      H.263;
      
      or H.264 standards.

28. A method of stitching a plurality of input video bitstreams conforming to the ITU-T H.264 video coding standard, the method comprising:
- decoding said plurality of input video bitstreams to produce a plurality of pixel-domain pictures;
  
  spatially composing said plurality of pixel-domain pictures to create an ideal stitched video frame;
  
  storing at least one of prediction information and a quantization parameter for at least a portion of the pixel domain pictures produced from said plurality of decoded video bitstreams;
  
  forming a stitched predictor by performing temporal prediction for inter-coded portions of the stitched video frame based on the stored information and a retained reference frame in said sequence of stitched video frames, and performing spatial prediction using stored information on the stitched video frame;
  
  forming a stitched raw residual by subtracting the stitched predictor for a portion of the stitched video frame from a corresponding portion of the ideal stitched video frame;
  
  forward transforming and quantizing the stitched raw residual; and
  
  entropy encoding the forward transformed and quantized stitched raw residual.
- View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36, 37)
- - 29. The method of claim 28 wherein said prediction information comprises one or more of macroblock type, intra luma prediction mode, intra chroma prediction mode, motion vectors and reference picture indices.
  - 30. The method of claim 28 further comprising the steps of:
    - storing one of a sub-macroblock type, motion vector, and reference picture index, for at least one portion of the pixel domain pictures produced from said plurality of decoded video bitstreams.
  - 31. The method of claim 29 further comprising the steps of:
    - modifying the stored macroblock type from P_SKIP to P_L0_—16×
      
      16 for a portion of a pixel domain picture for which the stored motion vector takes any part of the portion of the pixel domain picture into a spatial region in the ideal stitched picture corresponding to a pixel domain picture produced from a bitstream that is different from the bitstream from which the pixel domain picture whose spatial region the portion is situated was produced.
  - 32. The method of claim 29 further comprising the steps of:
    - computing a coded block pattern that identifies pixel blocks within said portion of said ideal stitched video frame which resulted in non-zero coefficients after forward transforming and quantizing the stitched raw residual;
      
      modifying the stored macroblock type for at least one macroblock for which the macroblock prediction mode is Intra_—16×
      
      16 in such a way that the modified macroblock type continues to have a macroblock prediction mode of Intra_—16×
      
      16;
      
      modifying the stored macroblock type for at least one macroblock for which the macroblock prediction mode is Intra_—16×
      
      16 in such a way that intra 16×
      
      16 prediction mode of the modified macroblock type is unchanged; and
      
      modifying the stored macroblock type for at least one macroblock for which the macroblock prediction mode is Intra_—16×
      
      16 in such a way that the modified macroblock type corresponds to the computed coded_block_pattern.
  - 33. The method of claim 28 further comprising the steps of:
    - computing a syntax element mb_skip_run for an image slice by computing the number of consecutive macroblocks that have macroblock type equal to P_SKIP;
      
      computing a syntax element mb_qp_delta for a macroblock by subtracting a stored quantization parameter for the spatially preceding macroblock from the said macroblock;
      
      computing a syntax element prev_intra4×
      
      4_pred_mode and a syntax element rem_intra4×
      
      4_pred_mode for at least one block in a macroblock for which the stored prediction mode is Intra_—4×
      
      4; and
      
      computing a syntax element mvd_—10 for at least one partition in at least one macroblock by subtracting a motion vector predicted from neighboring portions from the stored motion vector.
  - 34. The method of claim 28 further comprising the steps of:
    - forming a MISSING_IDR_SLICE that is inserted into a first frame of said stitched bitstream, said missing _IDR_SLICE corresponding to a spatial portion in the ideal stitched picture for which no pixel-domain picture was decoded;
      
      forming a MISSING_P_SLICE_WITH_I_MBS that is inserted into a frame of said stitched bitstream, said missing _P_SLICE_WITH_I_MBS corresponding to a spatial portion in the ideal stitched picture for which no pixel-domain picture was decoded;
      
      forming a MISSING_P_SLICE_WITH_P_MBS that is inserted into a frame of said stitched bitstream, said missing_P_SLICE_WITH_P_MBS corresponding to a spatial portion in the ideal stitched picture for which the pixel-domain picture from the temporally prior ideal stitched picture was reused.
  - 35. The method of claim 29 further comprising the steps of modifying the stored reference picture index.
  - 36. The method of claim 29 further comprising the steps of:
    - constructing a mapping between a frame number of a decoded video bitstream and stitched bitstream;
      
      using the mapping to modify the stored reference picture index that refers to a short-term reference picture in said decoded video bitstream.
  - 37. The method of claim 28 further comprising the steps of determining whether stitching for at least one quadrant in the stitched video frame may be simplified when the corresponding input picture is coded using only I-slices;
    - generating a picture parameter set for the stitched bitstream that captures the same slice group structure in the said quadrant as that in the said input picture; and
      
      making specific changes to a corresponding slice header part but not to a corresponding slice data part of a NAL unit corresponding to said quadrant in the stitched bitstream.

38. A method of stitching a plurality of input video bitstreams conforming to the ITU-T H.263 video coding standard, the method comprising:
- decoding said plurality of video bitstreams to produce a plurality of pixel-domain pictures;
  
  spatially composing said plurality of pixel-domain pictures to create a single ideal stitched video frame;
  
  storing at least one of prediction information, and a quantization parameter for at least one macroblock in said plurality of decoded video bitstreams;
  
  forming a stitched predictor by performing temporal prediction for inter-coded portions of the stitched video frame based on the stored information and a retained reference frame in said sequence of stitched video frames and performing spatial prediction using stored information on the stitched video frame;
  
  forming a stitched raw residual by subtracting the stitched predictor for a portion of the stitched video frame from a corresponding portion of the ideal stitched video frame;
  
  forward transforming and quantizing the stitched raw residual; and
  
  entropy encoding the forward transformed and quantized stitched raw residual to form a stitched bitstream.
- View Dependent Claims (39, 40, 41, 42, 43)
- - 39. The method of claim 38 wherein said prediction information comprises macroblock type and motion vectors.
  - 40. The method of claim 39 further comprising the steps of:
    - computing a differential quantization parameter for at least one macroblock by subtracting the quantization parameter of a temporally preceding macroblock from the quantization parameter of said macroblock and clipping the difference;
      
      modifying the macroblock type from INTRA to INTRA+Q for at least one macroblock for which the computed differential quantization parameter is not equal to 0;
      
      modifying the macroblock type from INTER to INTER+Q for at least one macroblock for which the computed differential quantization parameter is not equal to 0;
      
      modifying the macroblock type from INTRA+Q to INTRA for at least one macroblock for which the computed differential quantization parameter is equal to 0;
      
      modifying the macroblock type from INTER+Q to INTER for at least one macroblock for which the computed differential quantization parameter is equal to 0;
      
      computing a coded block pattern that identifies pixel blocks within said portion of said ideal stitched video frame which resulted in non-zero coefficients after forward transforming and quantizing the stitched raw residual; and
      
      computing a syntax element COD for at least one macroblock based on a computed coded block pattern, a computed differential quantization parameter and a stored motion vector.
  - 41. The method of claim 38 wherein said input video bitstreams use none or one or more of H.263 Annexes D, E, F, I, J, K, R, S, T, and U.
  - 42. The method of claim 38 wherein said stitched video bitstream uses none of the optional H.263 Annexes.
  - 43. The method of claim 38 further comprising the steps of:
    - forming a MISSING_I_FRAME that is inserted into a first frame of said stitched bitstream, said missing_I_FRAME corresponding to a spatial portion in the ideal stitched picture for which no pixel-domain picture was decoded;
      
      forming a MISSING_GOB_WITH_I_MBS that is inserted into a frame of said stitched bitstream, said missing_GOB_WITH_I_MBS corresponding to a spatial portion in the ideal stitched picture for which no pixel-domain picture was decoded;
      
      forming a MISSING_GOB_WITH_P_MBS that is inserted into a frame of said stitched bitstream, said missing_GOB_WITH_P_MBS corresponding to a spatial portion in the ideal stitched picture for which pixel-domain picture from the temporally prior ideal stitched picture was reused.

44. A partially drift-free method for performing nearly compressed domain video stitching for H.263 video bitstreams, method comprising:
- parsing a plurality of individual video bitstreams;
  
  decoding picture, GOB (group of blocks), and MB (macroblock) layer headers in the said individual video bitstreams;
  
  modifying a differential motion vector for at least one macroblock associated with one of said individual video bitstreams;
  
  modifying a COD value from 1 to 0 for at least one macroblock in one of said individual video bitstreams;
  
  modifying a DQUANT value for at least one macrobrock in one of said individual video bitstreams;
  
  modifying a QUANT value for at least one macroblock in one of said video bitstreams;
  
  requantizing and VLC encoding the macroblock for which the QUANT value was modified; and
  
  constructing the stitched bitstream including the modified DQUANT value and the requantized VLC encoded macroblock.
- View Dependent Claims (45, 46, 47)
- - 45. The method of claim 44 wherein only the COD values of the macroblocks in any row on either side of a quadrant boundary that are 1 are changed to 0.
  - 46. The method of claim 44 wherein the QUANT value of the macroblocks are modified only if the DQUANT modification exceeds the H.263 standard specified limit of 2 or is below the H.263 standard specified limit of −
    - 2.
  - 47. The method of claim 44 further comprising distributing the QUANT modification to macroblocks in any row on either side of a quadrant boundary by using a quality of stitching metric that captures the extent of QUANT modification as well as the number of times requantization and reencoding is needed.

48. A lossless method for performing compressed domain video stitching of a plurality of H.263 video bitstreams encapsulated as RTP packets the method comprising:
- extracting a plurality of individual video bitstreams from a current incoming RTP packets from among a plurality of incoming RTP packets;
  
  parsing the individual video bitstreams;
  
  decoding picture, GOB (group of blocks), and MB (macroblock) layer headers in the individual video bitstreams;
  
  modifying a differential motion vector for at least one macroblock in one of said individual video bitstreams;
  
  modifying a DQUANT value for at least one macrobrock in one of said individual video bitstreams;
  
  terminating the current incoming RTP packet and starting a next RTP packet of said plurality of incoming RTP packets the if the absolute value of the DQUANT modification exceeds 2, or if a motion vector points to a location in another quadrant for a macroblock in one of said video bitstreams, and incorporating an actual MV and QUANT value in the RTP header fields of every RTP packet of the stitched video bitstream.

49. A lossless method of performing video stitching on first, second, third, and fourth individual video sequences encoded according to ITU-T H.263 Annex K where each video frame of said first, second, third, and fourth video sequence comprises a plurality of rectangular slices, the method comprising:
- modifying OPPTYPE bits 1-3 in a picture header of a frame in said first video sequence;
  
  modifying an MBA parameter for each slice in a frame from each of said first, second, third, and fourth video sequences such that the modified MBA parameters represent locations in a stitched video frame having four times higher resolution than a frame in said first, second, third and fourth video sequences, such that slices from said first video sequence occupy a first quadrant of said stitched video frame, slices from said second video sequence occupy a second quadrant of said stitched video frame, slices from said third video sequence occupy a third quadrant of said stitched video frame, and slices from said fourth video sequence occupy a fourth quadrant of said stitched video frame; and
  
  arranging the slices from the first, second, third, and fourth video sequences into a stitched bitstream such that the slices from said first video stream alternate with the slices from said second video stream, and the slices from the third video sequences alternate with the slices from the fourth vide sequences, following the slices from the first and second video sequences in a similar alternating manner.

50. A method of stitching frames from a plurality of video sequences comprising:
- defining a nominal frame rate f_nom;
  
  defining a maximum frame rate f_max;
  
  decoding received frames in said plurality of video sequences;
  
  stitching together a set of decoded frames one from each of said plurality of video sequences to form a composite stitched video frame;
  
  determining when bitstream data corresponding to two complete frames belonging to one of the said plurality of video sequences are available for decoding;
  
  defining a time t_tauas the time elapsed between the time a previous composite frame was stitched and the time that bitstream data corresponding to two complete frames belonging to one of the said plurality of video sequences are available for decoding;
  
  invoking the stitching operation at a time t_s, where t_sis equal to the greater of 1/f_maxand the smaller of 1/f_nomand t_tau.

51. A method of concealing macroblock lost in the transmission of an H.264 encoded video stream comprising:
- determining whether the macroblock was in an inter-coded slice;
  
  if the slice was an inter-coded slice, estimating the motion vector and corresponding reference picture of the lost macroblock from received macroblocks neighboring the lost macroblock;
  
  performing motion compensation using the estimated motion vector and corresponding reference picture to obtain pixel information for the lost macroblock.

52. A method of concealing a macroblock lost in transmission of an H.264 encoded video stream, the method comprising:
- determining whether the macroblock was in an intra-coded slice or an IDR slice;
  
  if the slice was an intra-coded slice or IDR slice, initiating a videofastupdatepicture command through an H.241 signaling mechanism.

53. A method of concealing the loss of bitstream data corresponding to one or more frames in the transmission of an H.264 encoded video stream comprising:
- determining a number of frames lost in transmission;
  
  copying pixel information from a temporally previous frame to re-create a lost frame;
  
  marking said lost frame as a short-term reference picture through a sliding window process specified in the H.264 standard.

54. A method of decoding an ITU-T H.264 bitstream comprising:
- initiating a videofastupdatepicture command via an H.241 signalling method when any one of the following conditions is detected;
  
  a loss of sequence parameter set is detected in the bitstream;
  
  a loss of picture parameter set is detected in the bitstream;
  
  a loss of an IDR-slice is detected in the bitstream;
  
  a loss of an I-slice is detected in the bitstream;
  
  or gaps in frame_num are allowed in the bitstream and packet loss is detected in the bitstream;

55. A method of concealing a macroblock lost in the transmission of an H.263 encoded video stream comprising:
- determining whether the macroblock was a P-macroblock;
  
  if the macroblock was a P-macroblock, estimating the motion vector of the lost macroblock from received macroblocks neighboring the lost macroblock;
  
  performing motion compensation using the estimated motion vector to obtain pixel information for the lost macroblock.

56. A method of concealing a macroblock lost in the transmission of an H.263 encoded video stream comprising:
- determining whether the macroblock was an I-macroblock in an I-frame;
  
  if the macroblock was an I-macroblock in an I-frame, initiating a videofastupdatepicture command through an H.245 signaling mechanism.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hughes Network Systems LLC (Echostar Corporation)
Original Assignee
Hughes Network Systems LLC (Echostar Corporation)
Inventors
Panchapakesan, Kannan, Banerji, Ashish, Swaminathan, Kumar

Application Number

US10/836,672
Publication Number

US 20050008240A1
Time in Patent Office

Days
Field of Search
US Class Current

382/238
CPC Class Codes

H04N 19/40   using video transcoding, i....

H04N 19/46   Embedding additional inform...

H04N 19/467   characterised by the embedd...

H04N 19/573   Motion compensation with mu...

H04N 19/65   using error resilience

H04N 19/70   characterised by syntax asp...

H04N 19/89   involving methods or arrang...

H04N 19/895   in combination with error c...

H04N 5/2624   for obtaining an image whic...

H04N 7/15   Conference systems

Stitching of video for continuous presence multipoint video conferencing

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

281 Citations

56 Claims

Specification

Solutions

Use Cases

Quick Links

Stitching of video for continuous presence multipoint video conferencing

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

281 Citations

56 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links