Multi-subframe quantization of spectral parameters
First Claim
1. A method of encoding speech into a frame of bits, the method including:
- digitizing a speech signal into a sequence of digital speech samples;
dividing the digital speech samples into a sequence of subframes, each of the subframes including multiple digital speech samples;
estimating a set of speech model parameters for each subframe, wherein the speech model parameters include a set of spectral magnitude parameters that represent spectral magnitude information for the subframe;
combining consecutive subframes from the sequence of subframes into a frame;
jointly quantizing the spectral magnitude parameters from the consecutive subframes of the frame to produce a set of encoder spectral bits, wherein;
the joint quantization includes forming predicted spectral magnitude parameters from quantized spectral magnitude parameters from a previous subframe;
a subframe of the frame includes a number of spectral magnitude parameters that may vary from a number of spectral magnitude parameters in the previous subframe; and
the joint quantization accounts for any variation between the number of spectral magnitude parameters in the subframe of the frame and the number of spectral magnitude parameters in the previous subframe; and
including the encoder spectral bits in a frame of bits.
1 Assignment
0 Petitions
Accused Products
Abstract
Speech is encoded into a frame of bits. A speech signal is digitized into a sequence of digital speech samples that are then divided into a sequence of subframes. A set of model parameters is estimated for each subframe. The model parameters include a set of spectral magnitude parameters that represent spectral information for the subframe. Two or more consecutive subframes from the sequence of subframes may be combined into a frame. The spectral magnitude parameters from both of the subframes within the frame may be jointly quantized. The joint quantization includes forming predicted spectral magnitude parameters from the quantized spectral magnitude parameters from the previous frame, computing the residual parameters as the difference between the spectral magnitude parameters and the predicted spectral magnitude parameters, combining the residual parameters from both of the subframes within the frame, and quantizing the combined residual parameters into a set of encoded spectral bits which are included in the frame of bits.
-
Citations
54 Claims
-
1. A method of encoding speech into a frame of bits, the method including:
-
digitizing a speech signal into a sequence of digital speech samples; dividing the digital speech samples into a sequence of subframes, each of the subframes including multiple digital speech samples; estimating a set of speech model parameters for each subframe, wherein the speech model parameters include a set of spectral magnitude parameters that represent spectral magnitude information for the subframe; combining consecutive subframes from the sequence of subframes into a frame; jointly quantizing the spectral magnitude parameters from the consecutive subframes of the frame to produce a set of encoder spectral bits, wherein; the joint quantization includes forming predicted spectral magnitude parameters from quantized spectral magnitude parameters from a previous subframe; a subframe of the frame includes a number of spectral magnitude parameters that may vary from a number of spectral magnitude parameters in the previous subframe; and the joint quantization accounts for any variation between the number of spectral magnitude parameters in the subframe of the frame and the number of spectral magnitude parameters in the previous subframe; and including the encoder spectral bits in a frame of bits. - View Dependent Claims (2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 17)
-
-
8. A method of encoding speech into a frame of bits, the method including:
-
digitizing a speech signal into a sequence of digital speech samples; dividing the digital speech samples into a sequence of subframes, each of the subframes including multiple digital speech samples; estimating a set of speech model parameters for each subframe, wherein the speech model parameters include a set of spectral magnitude parameters that represent spectral information for the subframe; combining consecutive subframes from the sequence of subframes into a frame; jointly quantizing the spectral magnitude parameters from the consecutive subframes of the frame to produce a set of encoder spectral bits, wherein the joint quantization includes forming predicted spectral magnitude parameters from quantized spectral magnitude parameters from a previous frame; and including the encoder spectral bits in a frame of bits; wherein the joint quantization comprises; computing residual parameters as the difference between the spectral magnitude parameters and the predicted spectral magnitude parameters; combining the residual parameters from the consecutive subframes within the frame; and quantizing the combined residual parameters into a set of encoder spectral bits; and combining the residual parameters from the consecutive subframes within the frame comprises; dividing the residual parameters from each of the subframes into frequency blocks; performing a linear transformation on the residual parameters within each frequency block to produce a set of transformed residual coefficients for each subframe; grouping a minority of the transformed residual coefficients from the frequency blocks for each subframe into a prediction residual block average (PRBA) vector for the subframe; grouping the remaining transformed residual coefficients for each frequency block of each subframe into a higher order coefficient (HOC) vector for the frequency block; transforming the PRBA vectors to produce a transformed PRBA vector for each subframe; combining the transformed PRBA vectors for the subframes of the frame by computing generalized sum and difference vectors from the transformed PRBA vectors; and combining the HOC vectors within each frequency block for the subframes of the frame by computing generalized sum and difference vectors from the HOC vectors for each frequency block. - View Dependent Claims (15, 16, 18, 19)
-
-
20. A speech encoder for encoding speech into a frame of bits, the encoder including:
-
means for digitizing a speech signal into a sequence of digital speech samples; means for dividing the digital speech samples into a sequence of subframes, each of the subframes including multiple digital speech samples; means for estimating a set of speech model parameters for each subframe, wherein the speech model parameters include a set of spectral magnitude parameters that represent spectral magnitude information for the subframe; means for combining consecutive subframes from the sequence of subframes into a frame; means for jointly quantizing the spectral magnitude parameters from the consecutive subframes of the frame to produce a set of encoder spectral bits, wherein; the means for jointly quantizing forms predicted spectral magnitude parameters from quantized spectral magnitude parameters from a previous subframe; a subframe of the frame includes a number of spectral magnitude parameters that may vary from a number of spectral magnitude parameters in the previous subframe; and the means for jointly quantizing accounts for any variation between the number of spectral magnitude parameters in the subframe of the frame and the number of spectral magnitude parameters in the previous subframe; and means for forming a frame of bits including the encoder spectral bits. - View Dependent Claims (21, 22, 23, 24, 25)
-
-
26. A method of decoding speech from a frame of bits, the method comprising:
-
extracting decoder spectral bits from the frame of bits; using the decoder spectral bits to jointly reconstruct spectral magnitude parameters for consecutive subframes within a frame of speech, wherein the joint reconstruction includes; inverse quantizing the decoder spectral bits to reconstruct a set of combined residual parameters for the frame from which separate residual parameters for each of the subframes are computed; forming predicted spectral magnitude parameters from reconstructed spectral magnitude parameters from a previous subframe; and adding the separate residual parameters to the predicted spectral magnitude parameters to form the reconstructed spectral magnitude parameters for each subframe within the frame;
whereina subframe of the frame includes a number of spectral magnitude parameters that may vary from a number of spectral magnitude parameters in the previous subframe; and the joint reconstruction accounts for any variation between the number of spectral magnitude parameters in the subframe of the frame and the number of spectral magnitude parameters in the previous subframe; and synthesizing digital speech samples for each subframe within the frame of speech using speech model parameters which include some or all of the reconstructed voiced/unvoiced metrics and some or all of the reconstructed spectral magnitude parameters for the subframe. - View Dependent Claims (27, 28, 29, 30, 31, 33, 34, 35, 36, 37, 38, 41)
-
-
32. A method of decoding speech from a frame of bits, the method comprising:
-
extracting decoder spectral bits from the frame of bits; using the decoder spectral bits to jointly reconstruct spectral magnitude parameters for consecutive subframes within a frame of speech, wherein the joint reconstruction includes; inverse quantizing the decoder spectral bits to reconstruct a set of combined residual parameters for the frame from which separate residual parameters for each of the subframes are computed; forming predicted spectral magnitude parameters from reconstructed spectral magnitude parameters from a previous frame; and adding the separate residual parameters to the predicted spectral magnitude parameters to form the reconstructed spectral magnitude parameters for each subframe within the frame; and synthesizing digital speech samples for each subframe within the frame of speech using speech model parameters which include some or all of the reconstructed spectral magnitude parameters for the subframe; wherein the computing of the separate residual parameters for each subframe from the combined residual parameters for the frame comprises; dividing each subframe into frequency blocks; separating the combined residual parameters for the frame into generalized sum and difference vectors representing transformed PRBA vectors combined across the subframes of the frame, and into generalized sum and difference vectors representing HOC vectors for the frequency blocks combined across the subframes of the frame; computing PRBA vectors for each subframe from the generalized sum and difference vectors representing the transformed PRBA vectors; computing HOC vectors for each subframe from the generalized sum and difference vectors representing the HOC vectors for each of the frequency blocks; combining the PRBA vector and the HOC vectors for each of the frequency blocks to form transformed residual coefficients for each of the subframes; and performing an inverse transformation on the transformed residual coefficients to produce the separate residual parameters for each subframe of the frame. - View Dependent Claims (39, 40)
-
-
42. A decoder for decoding speech from a frame of bits, the decoder including:
-
means for extracting decoder spectral bits from the frame of bits; means for using the decoder spectral bits to jointly reconstruct spectral magnitude parameters for consecutive subframes within a frame of speech, wherein the joint reconstruction includes; inverse quantizing the decoder spectral bits to reconstruct a set of combined residual parameters for the frame from which separate residual parameters for each of the subframes are computed; forming predicted spectral magnitude parameters from reconstructed spectral magnitude parameters from a previous subframe; and adding the separate residual parameters to the predicted spectral magnitude parameters to form the reconstructed spectral magnitude parameters for each subframe within the frame;
whereina subframe of the frame includes a number of spectral magnitude parameters that may vary from a number of spectral magnitude parameters in the previous subframe; and the joint reconstruction accounts for any variation between the number of spectral magnitude parameters in the subframe of the frame and the number of spectral magnitude parameters in the previous subframe; and means for synthesizing digital speech samples for each subframe within the frame of speech using speech model parameters which include some or all of the reconstructed spectral magnitude parameters for the subframe. - View Dependent Claims (43, 44, 45, 46, 47, 48, 49, 50, 52, 53)
-
-
51. A method of encoding a level of speech into a frame of bits, the method comprising:
-
digitizing a speech signal into a sequence of digital speech samples; dividing the digital speech samples into a sequence of subframes, each of the subframes including multiple digital speech samples; estimating a speech level parameter for each of the subframes, wherein the speech level parameter is representative of the amplitude of the digital speech samples comprising the subframe; combining a plurality of consecutive subframes from the sequence of subframes into a frame; jointly quantizing the speech level parameters from the plurality of consecutive subframes within the frame, characterized in that the joint quantization includes computing and quantizing an average level parameter by combining the speech level parameters over the subframes within the frame, and computing and quantizing a difference level vector between the speech level parameters for each subframe within the frame and the average level parameter; and including quantized bits representative of the average level parameter and the difference level vector in a frame of bits. - View Dependent Claims (54)
-
Specification