Advanced bi-directional predictive coding of video frames

US 20050013365A1
Filed: 07/18/2003
Published: 01/20/2005
Est. Priority Date: 07/18/2003
Status: Active Grant

First Claim

Patent Images

1. In a computer system, a method of processing images in a sequence of video images, the method comprising:

determining a fraction for a current image in the sequence, wherein the fraction represents an estimated temporal distance position for the current image relative to an interval between a first reference image for the current image and a second reference image for the current image; and

processing the fraction along with a motion vector for the first reference image, wherein the motion vector represents motion in the first reference image relative to a second reference image for the current image, and wherein the processing the fraction along with the motion vector results in a representation of motion in the current image relative to the first reference image.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques and tools for coding/decoding of video images, and in particular, B-frames, are described. In one aspect, a video encoder/decoder determines a fraction for a current image in a sequence. The fraction represents an estimated temporal distance position for the current image relative to an interval between a reference images for the current image. The video encoder/decoder processes the fraction along with a motion vector for a first reference image, resulting in a representation of motion (e.g., constant or variable velocity motion) in the current image. Other aspects are also described, including intra B-frames, forward and backward buffers for motion vector prediction, bitplane encoding of direct mode prediction information, multiple motion vector resolutions/interpolation filters for B-frames, proactive dropping of B-frames, and signaling of dropped predicted frames.

Citations

73 Claims

1. In a computer system, a method of processing images in a sequence of video images, the method comprising:
- determining a fraction for a current image in the sequence, wherein the fraction represents an estimated temporal distance position for the current image relative to an interval between a first reference image for the current image and a second reference image for the current image; and
  
  processing the fraction along with a motion vector for the first reference image, wherein the motion vector represents motion in the first reference image relative to a second reference image for the current image, and wherein the processing the fraction along with the motion vector results in a representation of motion in the current image relative to the first reference image.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1 wherein the fraction is represented by a variable length code in a bit stream.
  - 3. The method of claim 1 wherein the fraction is selected from a set of discrete values, wherein the values are greater than zero and less than one.
  - 4. The method of claim 1 wherein the fraction is selected from the group consisting of:
    - ½
      
      , ⅓
      
      , ⅔
      
      , ¼
      
      , ¾
      
      , ⅕
      
      , ⅖
      
      , ⅗
      
      , ⅘
      
      , ⅙
      
      , ⅚
      
      , {fraction (1/7)}, {fraction (2/7)}, and {fraction (3/7)}.
  - 5. The method of claim 1 wherein the estimated temporal position for the current image relative to the interval between the first reference image for the current image and the second reference image for the current image is not the true temporal position of the current image.
  - 6. The method of claim 1 wherein the fraction is based on motion information for the sequence of video images.
  - 7. The method of claim 1 wherein the fraction is based on a proximity of the current image to an end of the sequence of video images.
  - 8. The method of claim 1 wherein the representation of motion in the current image comprises representation of variable velocity motion.
  - 9. The method of claim 1 wherein the determining the fraction comprises:
    - evaluating a set of plural fractions to determine bit costs for encoding the current image using the plural fractions; and
      
      selecting a fraction from the set of plural fractions based on the evaluating.
  - 10. The method of claim 1 further comprising repeating the acts of claim 1 for plural images in the sequence of video images.
  - 11. A computer readable medium storing computer executable instructions for causing the computer system to perform the method of claim 1 during video encoding.
  - 12. A computer readable medium storing computer executable instructions for causing the computer system to perform the method of claim 1 during video decoding.

13. In a computer system, a method of processing images in a sequence of video images, the method comprising:
- determining a fraction for a region of a current image in the sequence, wherein the fraction represents an estimated temporal distance position for the current image relative to an interval between a first reference image for the current image and a second reference image for the current image; and
  
  processing the fraction along with a motion vector for the first reference image, wherein the motion vector represents motion in the first reference image relative to the second reference image, and wherein the processing the fraction along with the motion vector results in a representation of motion in the region of the current image.
- View Dependent Claims (14, 15, 16, 17)
- - 14. The method of claim 13 wherein the region is a slice of the current image.
  - 15. The method of claim 13 wherein the region is a macroblock of the current image.
  - 16. The method of claim 13 further comprising repeating the acts of claim 13 for plural regions in the current image.
  - 17. The method of claim 16 wherein a fraction for a first region in the current image differs from a fraction for a second region in the current image.

18. In a computer system, a method of encoding images in a sequence of video images, the method comprising:
- determining a fraction for a current image in the sequence, wherein the current image has a previous reference image and a future reference image, and wherein the fraction represents a temporal position for the current image relative to its reference images;
  
  selecting direct mode prediction for a current macroblock in the current image;
  
  finding a motion vector for a co-located macroblock in the future reference image;
  
  scaling the motion vector for the co-located macroblock using the fraction.
- View Dependent Claims (19, 20, 21, 22, 23)
- - 19. The method of claim 18 wherein the fraction facilitates representation of variable velocity motion in the direct mode prediction.
  - 20. The method of claim 18 wherein the scaling the motion vector for the co-located macroblock comprises scaling the vertical component and horizontal components of the motion vector for the co-located macroblock.
  - 21. The method of claim 18 wherein the scaling the motion vector for the co-located macroblock comprises:
    - scaling the motion vector for the co-located macroblock by a factor of the fraction, to obtain an implied forward motion vector for the current macroblock; and
      
      scaling the motion vector for the co-located macroblock by a factor of the fraction minus one, to obtain an implied backward motion vector for the current macroblock.
  - 22. The method of claim 21 further comprising:
    - addressing a macroblock in the future reference frame using the implied forward motion vector;
      
      addressing a macroblock in the previous reference frame using the implied backward motion vector; and
      
      predicting the current macroblock using an average of the macroblock in the future reference frame and the macroblock in the previous reference frame.
  - 23. A computer readable medium storing computer executable instructions for causing the computer system to perform the method of claim 18 during video encoding.

24. In a computer system, a method of processing images in a sequence of video images, the method comprising:
- determining a temporal position of a current image in the sequence, wherein the current image has plural references, wherein the temporal position is between a first reference image based on at least one reference for the current image and a second reference image for the current image, and wherein the temporal position is determined independent of time stamps; and
  
  processing the current image based on the temporal position of the current image and a motion vector for the first at least one reference image, wherein the motion vector represents motion in the first reference image relative to the second reference image, and wherein the processing results in a representation of motion in the current image.
- View Dependent Claims (25, 26, 27)
- - 25. The method of claim 24 wherein the determining is based on a fixed inter-frame distance and a fixed number of images with plural references within the sequence of video images.
  - 26. A computer readable medium storing computer executable instructions for causing the computer system to perform the method of claim 24 during video encoding.
  - 27. A computer readable medium storing computer executable instructions for causing the computer system to perform the method of claim 24 during video decoding.

28. In a computer system, a method of encoding a current image in a sequence of video images, the current image having at least two reference images in the sequence, the method comprising:
- analyzing the at least two reference images along with the current image to determine whether the current image is to be predictively encoded based on the at least two reference images;
  
  based on the analyzing, encoding the current image independently from the at least two reference images; and
  
  assigning an image type to the current image, wherein the image type indicates that the current image is encoded independently from the at least two reference images.
- View Dependent Claims (29, 30, 31, 32, 33)
- - 29. The method of claim 28 wherein the analyzing comprises analyzing motion in the current image.
  - 30. The method of claim 28 wherein the image type is encoded as a variable length code from a variable length code table.
  - 31. The method of claim 30 wherein other variable length codes in the variable length code table represent fractions for representing the temporal position of images in the sequence of video images.
  - 32. The method of claim 30 wherein other variable length codes in the variable length code table represent other image types.
  - 33. The method of claim 28 further comprising designating the current image as a non-reference image, wherein the image type further indicates that the current image is a non-reference image.

34. In a video decoder, a method of decoding a current image in an encoded video image sequence, the current image having at least two reference images in the video image sequence, wherein the decoding yields a decoded video stream, the method comprising:
- receiving an image type for the current image, wherein the image type indicates that the current image is encoded independently from the at least two reference images, and analyzing bit rate constraints for the decoding; and
  
  determining whether to omit the current image from the decoded video stream based on the analyzing and the image type for the current image.
- View Dependent Claims (35)
- - 35. The method of claim 34 wherein the video decoder maintains a bias in favor of omitting the current image relative to encoded images having different frame types.

36. In a computer system, a computer-implemented method of processing video images in a video image sequence, the method comprising:
- processing a bit plane for a bi-directionally predicted video image, wherein the bit plane comprises binary information signifying whether macroblocks in the bi-directionally predicted video image are encoded using direct mode prediction or non-direct mode prediction.
- View Dependent Claims (37, 38, 39, 40, 41, 42, 43)
- - 37. The method of claim 36 wherein the bit plane is a frame level bit plane.
  - 38. The method of claim 36 further comprising, when the binary information indicates a non-direct prediction mode for a macroblock, processing a variable length code in a variable length coding table, wherein the variable length code represents the non-direct prediction mode.
  - 39. The method of claim 36 further comprising, when the binary information indicates a non-direct prediction mode for a macroblock, processing a variable length code in a variable length coding table, wherein the non-direct prediction mode is intra mode and is indicated by a combination of the variable length code and a motion vector code, and wherein the variable length code matches a variable length code used to represent a different non-direct prediction mode.
  - 40. The method of claim 36 further comprising, when the binary information indicates a non-direct prediction mode for a macroblock, representing the non-direct prediction mode with a variable length code from a variable length coding table comprising plural variable length codes, the plural variable length codes representing plural non-direct prediction modes, wherein a determination of which of the plural variable length codes represents which of the plural non-direct prediction modes is based at least in part on a fraction representing a temporal position of the a bi-directionally predicted video image.
  - 41. The method of claim 40 wherein the variable length coding table includes a code length preference for backward mode.
  - 42. A computer readable medium storing computer executable instructions for causing the computer system to perform the method of claim 36 during video encoding.
  - 43. A computer readable medium storing computer executable instructions for causing the computer system to perform the method of claim 36 during video decoding.

44. In a computer system, a method of processing images in a sequence of video images, the method comprising:
- determining a value representing a forward motion vector component for a macroblock in the current image;
  
  determining a value representing a backward motion vector component for the macroblock in the current image;
  
  adding the value representing the forward motion vector to a forward buffer;
  
  adding the value representing the backward motion vector to a backward buffer; and
  
  predicting motion vectors for other macroblocks in the current image using values in the forward buffer and values in the backward buffer.
- View Dependent Claims (45, 46, 47)
- - 45. The method of claim 44 wherein the predicting comprises finding a median of previously decoded neighboring motion vectors.
  - 46. The method of claim 44 wherein the values in the forward buffer are used to predict motion vectors for forward-predicted macroblocks.
  - 47. The method of claim 44 wherein the values in the backward buffer are used to predict motion vectors for backward-predicted macroblocks.

48. In a computer system, a method of processing an image in a sequence of video images, the method comprising:
- for a direct mode predicted macroblock in the image;
  
  determining a non-zero value representing a forward motion vector component for the direct mode predicted macroblock;
  
  determining a non-zero value representing a backward motion vector component for the direct mode predicted macroblock;
  
  adding the non-zero values to one or more buffers;
  
  wherein values in the one or more buffers are used to predict motion vectors for other macroblocks in the image.

49. In a computer system, a method of estimating motion for a bi-directionally predicted image in a sequence of video images, wherein the bi-directionally predicted image comprises macroblocks, and wherein the bi-directionally predicted image has a first reference image and a second reference image, the method comprising:
- selecting a motion vector resolution for the bi-directionally predicted image from among plural motion vector resolutions;
  
  selecting an interpolation filter for the bi-directionally predicted image from among plural interpolation filters; and
  
  encoding the bi-directionally predicted image using the selected motion vector resolution and the selected interpolation filter.
- View Dependent Claims (50, 51, 52, 53, 54)
- - 50. The method of claim 49 wherein the plural motion vector resolutions include a half-pixel resolution and a quarter pixel resolution.
  - 51. The method of claim 49 wherein the plural interpolation filters include a bicubic interpolation filter and a bilinear interpolation filter.
  - 52. The method of claim 49 wherein the bi-directionally predicted image is predicted using only one motion vector per macroblock in the bi-directionally predicted image.
  - 53. The method of claim 52, the method further comprising applying a four-motion-vector to one-motion-vector conversion to a macroblock in the first reference image, the macroblock in the first reference image having four motion vectors.
  - 54. The method of claim 53 wherein the conversion comprises:
    - determining a median vertical motion vector component among four vertical motion vector components of the four motion vectors of the macroblock in the first reference image;
      
      determining a median horizontal motion vector component among four horizontal motion vector components of the four motion vectors of the macroblock in the first reference image; and
      
      determining a median motion vector based on the median vertical motion vector component and the median horizontal motion vector component.

55. In a computer system, a method of predicting motion for a bi-directionally predicted image in a sequence of video images, wherein the bi-directionally predicted image comprises macroblocks, and wherein the bi-directionally predicted image has a first reference image and a second reference image, the method comprising:
- selecting a motion vector resolution for the bi-directionally predicted image from among plural motion vector resolutions, wherein the plural motion vector resolutions include a half-pixel resolution and a quarter pixel resolution;
  
  selecting an interpolation filter for the bi-directionally predicted image from among plural interpolation filters, wherein the plural interpolation filters include a bicubic interpolation filter and a bilinear interpolation filter; and
  
  encoding the bi-directionally predicted image using the selected motion vector resolution and the selected interpolation filter.

56. In a computer system, a method of predicting motion for a bi-directionally predicted image in a sequence of video images, wherein the bi-directionally predicted image comprises macroblocks, and wherein the bi-directionally predicted image has a first reference image and a second reference image, the method comprising:
- selecting a motion vector mode for the bi-directionally predicted image from a set of plural motion vector modes, wherein the set of plural motion vector modes includes;
  
  a one motion vector, quarter-pixel resolution, bicubic interpolation filter mode;
  
  a one motion vector, half-pixel resolution, bicubic interpolation filter mode; and
  
  a one motion vector, half-pixel resolution, bilinear interpolation filter mode; and
  
  encoding the bi-directionally predicted image using the selected motion vector mode.
- View Dependent Claims (57)
- - 57. The method of claim 56 wherein the selecting is based on an efficiency evaluation of encoding the bi-directionally predicted image using one or more of the plural motion vector modes.

58. In a computer system, a method of processing images in a video image sequence to yield a processed video image sequence, wherein the processing is performed at a constrained bit rate, the method comprising:
- monitoring bits used during the processing;
  
  based on the monitoring, determining whether to omit a current image having two reference images from the processed video image sequence, wherein the current image has a number of bits required to process the current image;
  
  wherein, at the time of the determining, a number of bits available for use in the processing is greater than or equal to the number of bits required to process the current image.
- View Dependent Claims (59, 60, 61, 62, 63, 64, 65)
- - 59. The method of claim 58 wherein the determining is further based on a comparison of the number of bits required to process the current image with a number of bits available for use prior to processing the current image.
  - 60. The method of claim 58 wherein the monitoring comprises comparing the bits used during the processing with a threshold number of bits.
  - 61. The method of claim 60 wherein the threshold is adaptively adjusted during the processing.
  - 62. The method of claim 58 wherein the current image is a non-reference image.
  - 63. The method of claim 58 wherein a bit stream syntax for the video image sequence includes plural levels, and wherein the determining is performed at frame level.
  - 64. A computer readable medium storing computer executable instructions for causing the computer system to perform the method of claim 58 during video encoding.
  - 65. A computer readable medium storing computer executable instructions for causing the computer system to perform the method of claim 58 during video decoding.

66. In a computer system, a method of processing images in a video image sequence to yield a processed video image sequence, wherein the images in the video image sequence comprise images having two reference images, wherein the processing is performed at a constrained bit rate, and wherein images in the video image sequence are operable to be omitted from the processed video image sequence based on the constrained bit rate, the method comprising:
- determining whether to omit a current image having two reference images from the processed video image sequence, wherein the current image has a number of bits required to process the current image, and wherein the determining comprises;
  
  if more than half of n images processed prior to the current image were omitted from the processed video image sequence, then omitting the current image from the processed video sequence if the number of bits required to process the current image is greater than the average bits per image used to process the n images processed prior to the current image; and
  
  if half or less than half of the n images processed prior to the current image were omitted from the processed video image sequence, then omitting the current image from the processed video sequence if the number of bits required to process the current image is greater than the twice the average bits per image used to process the n images processed prior to the current image.
- View Dependent Claims (67)
- - 67. The method of claim 66 wherein the determining is based in part on image type statistics for images in the video image sequence.

68. A method of processing a video image sequence, wherein the processing yields an encoded video image sequence in a bit stream having plural bit stream levels, wherein the plural bit stream levels include a frame level, and wherein the video image sequence comprises predicted images, the method comprising:
- omitting a predicted image in the video image sequence from the encoded video image sequence; and
  
  representing the omitted predicted image with a frame-level indicator in the bit stream;
  
  wherein the frame-level indicator is operable to indicate the omitted predicted image to a video decoder.
- View Dependent Claims (69, 70, 71, 72, 73)
- - 69. The method of claim 68 wherein the frame-level indicator is data having an indicator size, and wherein the indicator size indicates the omitted predicted image.
  - 70. The method of claim 69 wherein encoded images in the encoded video image sequence have a minimum encoded image size, and wherein the indicator size is smaller than the minimum encoded image size.
  - 71. The method of claim 69 wherein the indicator size is one byte.
  - 72. The method of claim 68 wherein the frame-level indicator causes the video decoder to choose one or more reference images for a predicted image.
  - 73. The method of claim 72 wherein the predicted image is a bi-directionally predicted image.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Srinivasan, Sridhar, Mukerjee, Kunal, Lin, Bruce Chih-Lung

Granted Patent

US 7,609,763 B2
Time in Patent Office

Days
Field of Search
US Class Current

375/240.160
CPC Class Codes

H04N 19/132   Sampling, masking or trunca...

H04N 19/56   Motion estimation with init...

H04N 19/577   Motion compensation with bi...

H04N 19/587   involving temporal sub-samp...

Advanced bi-directional predictive coding of video frames

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

73 Claims

Specification

Solutions

Use Cases

Quick Links

Advanced bi-directional predictive coding of video frames

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

73 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links