Encoding method for the compression of a video sequence
First Claim
1. An encoding method for the compression of a video sequence divided into groups of frames and decomposed by means of a three-dimensional (3D) wavelet transform leading to a given number of successive resolution levels that correspond to the decomposition levels of said transform, said method being based on a hierarchical subband encoding process leading from the original set of picture elements (pixels) of each group of frames to transform coefficients constituting a hierarchical pyramid, a spatio-temporal orientation tree—
- in which the roots are formed with the pixels of the approximation subband resulting from the 3D wavelet transform and the offspring of each of these pixels is formed with the pixels of the higher subbands corresponding to the image volume defined by these root pixels—
defining the spatio-temporal relationship inside said hierarchical pyramid, the subbands to be encoded being scanned one after the other in an order that respects the parent-offspring dependencies formed in said tree and preserves the initial subband structure of the 3D wavelet transform, said method being further characterized in that, in view of a temporal scalability, a motion estimation is performed at each temporal resolution level, the beginning of which is indicated by flags inserted into the bitstream, and only the estimated motion vectors necessary to reconstruct any given temporal resolution level are encoded and put in the bitstream together with the bits encoding the wavelet coefficients at this given temporal level, said motion vectors being inserted into said bitstream before encoding texture coefficients at the same temporal level.
3 Assignments
0 Petitions
Accused Products
Abstract
The invention relates to an encoding method for the compression of a video sequence. Said method, using a three-dimensional wavelet transform, is based on a hierarchical subband coding process in which the subbands to be encoded are scanned in an order that preserves the initial subband structure of the 3D wavelet transform. According to the invention, a temporal (resp. spatial) scalability is obtained by performing a motion estimation at each temporal resolution level (resp. at the highest spatial resolution level), and only the part of the estimated motion vectors necessary to reconstruct any given temporal (resp. spatial) resolution level is then encoded and put in the bitstream together with the bits encoding the wavelet coefficients at this given temporal (resp. spatial) level, said insertion in the bitstream being done before encoding texture coefficients at the same temporal (resp. spatial) level. Such a solution avoids to encode and send all the motion vector fields in the bitstream, which would be a drawback when a low bitrate is targeted and the receiver only wants a reduced frame rate or spatial resolution.
65 Citations
2 Claims
-
1. An encoding method for the compression of a video sequence divided into groups of frames and decomposed by means of a three-dimensional (3D) wavelet transform leading to a given number of successive resolution levels that correspond to the decomposition levels of said transform, said method being based on a hierarchical subband encoding process leading from the original set of picture elements (pixels) of each group of frames to transform coefficients constituting a hierarchical pyramid, a spatio-temporal orientation tree—
- in which the roots are formed with the pixels of the approximation subband resulting from the 3D wavelet transform and the offspring of each of these pixels is formed with the pixels of the higher subbands corresponding to the image volume defined by these root pixels—
defining the spatio-temporal relationship inside said hierarchical pyramid, the subbands to be encoded being scanned one after the other in an order that respects the parent-offspring dependencies formed in said tree and preserves the initial subband structure of the 3D wavelet transform, said method being further characterized in that, in view of a temporal scalability, a motion estimation is performed at each temporal resolution level, the beginning of which is indicated by flags inserted into the bitstream, and only the estimated motion vectors necessary to reconstruct any given temporal resolution level are encoded and put in the bitstream together with the bits encoding the wavelet coefficients at this given temporal level, said motion vectors being inserted into said bitstream before encoding texture coefficients at the same temporal level.
- in which the roots are formed with the pixels of the approximation subband resulting from the 3D wavelet transform and the offspring of each of these pixels is formed with the pixels of the higher subbands corresponding to the image volume defined by these root pixels—
-
2. An encoding method for the compression of a video sequence divided into groups of frames and decomposed by means of a three-dimensional (3D) wavelet transform leading to a given number of successive resolution levels that correspond to the decomposition levels of said transform, said method being based on a hierarchical subband encoding process leading from the original set of picture elements (pixels) of each group of frames to transform coefficients constituting a hierarchical pyramid, a spatio-temporal orientation tree—
- in which the roots are formed with the pixels of the approximation subband resulting from the 3D wavelet transform and the offspring of each of these pixels is formed with the pixels of the higher subbands corresponding to the image volume defined by these root pixels—
defining the spatio-temporal relationship inside said hierarchical pyramid, the subbands to be encoded being scanned one after the other in an order that respects the parent-offspring dependencies formed in said tree and preserves the initial subband structure of the 3D wavelet transform, said method being further characterized in that, in view of a spatial scalability, a motion estimation is performed at the highest spatial resolution level, the vectors thus obtained being divided by two in order to obtain the motion vectors for the lower spatial resolutions, and only the estimated motion vectors necessary to reconstruct any spatial resolution level are encoded and put in the bitstream together with the bits encoding the wavelet coefficients at this given spatial level, said motion vectors being inserted into said bitstream before encoding texture coefficients at the same spatial level, and said encoding operation being carried out on the motion vectors at the lowest spatial resolution, only refinement bits at each spatial resolution being then put in the bitstream bitplane by bitplane, from one resolution level to the other.
- in which the roots are formed with the pixels of the approximation subband resulting from the 3D wavelet transform and the offspring of each of these pixels is formed with the pixels of the higher subbands corresponding to the image volume defined by these root pixels—
Specification