Multi-subframe quantization of spectral parameters

US 6,161,089 A
Filed: 03/14/1997
Issued: 12/12/2000
Est. Priority Date: 03/14/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method of encoding speech into a frame of bits, the method including:

digitizing a speech signal into a sequence of digital speech samples;

dividing the digital speech samples into a sequence of subframes, each of the subframes including multiple digital speech samples;

estimating a set of speech model parameters for each subframe, wherein the speech model parameters include a set of spectral magnitude parameters that represent spectral magnitude information for the subframe;

combining consecutive subframes from the sequence of subframes into a frame;

jointly quantizing the spectral magnitude parameters from the consecutive subframes of the frame to produce a set of encoder spectral bits, wherein;

the joint quantization includes forming predicted spectral magnitude parameters from quantized spectral magnitude parameters from a previous subframe;

a subframe of the frame includes a number of spectral magnitude parameters that may vary from a number of spectral magnitude parameters in the previous subframe; and

the joint quantization accounts for any variation between the number of spectral magnitude parameters in the subframe of the frame and the number of spectral magnitude parameters in the previous subframe; and

including the encoder spectral bits in a frame of bits.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Speech is encoded into a frame of bits. A speech signal is digitized into a sequence of digital speech samples that are then divided into a sequence of subframes. A set of model parameters is estimated for each subframe. The model parameters include a set of spectral magnitude parameters that represent spectral information for the subframe. Two or more consecutive subframes from the sequence of subframes may be combined into a frame. The spectral magnitude parameters from both of the subframes within the frame may be jointly quantized. The joint quantization includes forming predicted spectral magnitude parameters from the quantized spectral magnitude parameters from the previous frame, computing the residual parameters as the difference between the spectral magnitude parameters and the predicted spectral magnitude parameters, combining the residual parameters from both of the subframes within the frame, and quantizing the combined residual parameters into a set of encoded spectral bits which are included in the frame of bits.

Citations

54 Claims

1. A method of encoding speech into a frame of bits, the method including:
- digitizing a speech signal into a sequence of digital speech samples;
  
  dividing the digital speech samples into a sequence of subframes, each of the subframes including multiple digital speech samples;
  
  estimating a set of speech model parameters for each subframe, wherein the speech model parameters include a set of spectral magnitude parameters that represent spectral magnitude information for the subframe;
  
  combining consecutive subframes from the sequence of subframes into a frame;
  
  jointly quantizing the spectral magnitude parameters from the consecutive subframes of the frame to produce a set of encoder spectral bits, wherein;
  
  the joint quantization includes forming predicted spectral magnitude parameters from quantized spectral magnitude parameters from a previous subframe;
  
  a subframe of the frame includes a number of spectral magnitude parameters that may vary from a number of spectral magnitude parameters in the previous subframe; and
  
  the joint quantization accounts for any variation between the number of spectral magnitude parameters in the subframe of the frame and the number of spectral magnitude parameters in the previous subframe; and
  
  including the encoder spectral bits in a frame of bits.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 17)
- - 2. The method of claim 1, wherein the joint quantization comprises:
    - computing residual parameters as the difference between the spectral magnitude parameters and the predicted spectral magnitude parameters;
      
      combining the residual parameters from the consecutive subframes within the frame; and
      
      quantizing the combined residual parameters into a set of encoder spectral bits.
  - 3. The method of claim 1, wherein the spectral magnitude parameters correspond to a frequency-domain representation of a spectral envelope of the subframe.
  - 4. The method of claim 1, wherein the number of spectral magnitude parameters in the subframe of the frame may vary from a number of spectral magnitude parameters in a second subframe of the frame;
    - andthe joint quantization accounts for any variation between the number of spectral magnitude parameters in the subframe of the frame and the number of spectral magnitude parameters in the second subframe of the frame.
  - 5. The method of claim 4, wherein the joint quantization accounts for any variation between the number of spectral magnitude parameters in a subframe of the frame and the number of spectral magnitude parameters in a second subframe of the frame by transforming the spectral magnitude parameters for the two subframes to produce one or more output vectors and limiting the number of elements within each output vector that are used in the joint quantization.
  - 6. The method of claim 1, wherein the joint quantization accounts for any variation between the number of spectral magnitude parameters in the subframe of the frame and the number of spectral magnitude parameters in the previous subframe by interpolating and resampling spectral magnitude parameters for the previous subframe and using the interpolated and resampled spectral magnitude parameters in forming the predicted spectral magnitude parameters.
  - 7. The method of claim 1, wherein the joint quantization accounts for any variation between the number of spectral magnitude parameters in a subframe of the frame and the number of spectral magnitude parameters in a second subframe of the frame by transforming the spectral magnitude parameters for the two subframes to produce one or more output vectors and limiting the number of elements within each output vector that are used in the joint quantization.
  - 9. The method of claim 1, 2 or 8, further comprising producing additional encoder bits by quantizing additional speech model parameters other than the spectral magnitude parameters.
  - 10. The method of claim 9, wherein the additional speech model parameters include parameters representative of a fundamental frequency and parameters representative of a voicing state.
  - 11. The method of claim 1, 2 or 8, wherein the frame of bits includes redundant error control bits protecting at least some of the encoder spectral bits.
  - 12. The method of claim 1, 2 or 8, wherein the spectral magnitude parameters represent log spectral magnitudes estimated for a Multi-Band Excitation (MBE) speech model.
  - 13. The method of claim 12, wherein the spectral magnitude parameters are estimated from a computed spectrum in a manner which is independent of a voicing state.
  - 14. The method of claim 2 or 8, wherein the predicted spectral magnitude parameters are formed by applying a gain of less than unity to a linear interpolation of quantized spectral magnitudes from a last subframe in a previous frame.
  - 17. The method of claim 2 or 8, wherein quantizing the combined residual parameters includes using at least one vector quantizer.

8. A method of encoding speech into a frame of bits, the method including:
- digitizing a speech signal into a sequence of digital speech samples;
  
  dividing the digital speech samples into a sequence of subframes, each of the subframes including multiple digital speech samples;
  
  estimating a set of speech model parameters for each subframe, wherein the speech model parameters include a set of spectral magnitude parameters that represent spectral information for the subframe;
  
  combining consecutive subframes from the sequence of subframes into a frame;
  
  jointly quantizing the spectral magnitude parameters from the consecutive subframes of the frame to produce a set of encoder spectral bits, wherein the joint quantization includes forming predicted spectral magnitude parameters from quantized spectral magnitude parameters from a previous frame; and
  
  including the encoder spectral bits in a frame of bits;
  
  wherein the joint quantization comprises;
  
  computing residual parameters as the difference between the spectral magnitude parameters and the predicted spectral magnitude parameters;
  
  combining the residual parameters from the consecutive subframes within the frame; and
  
  quantizing the combined residual parameters into a set of encoder spectral bits; and
  
  combining the residual parameters from the consecutive subframes within the frame comprises;
  
  dividing the residual parameters from each of the subframes into frequency blocks;
  
  performing a linear transformation on the residual parameters within each frequency block to produce a set of transformed residual coefficients for each subframe;
  
  grouping a minority of the transformed residual coefficients from the frequency blocks for each subframe into a prediction residual block average (PRBA) vector for the subframe;
  
  grouping the remaining transformed residual coefficients for each frequency block of each subframe into a higher order coefficient (HOC) vector for the frequency block;
  
  transforming the PRBA vectors to produce a transformed PRBA vector for each subframe;
  
  combining the transformed PRBA vectors for the subframes of the frame by computing generalized sum and difference vectors from the transformed PRBA vectors; and
  
  combining the HOC vectors within each frequency block for the subframes of the frame by computing generalized sum and difference vectors from the HOC vectors for each frequency block.
- View Dependent Claims (15, 16, 18, 19)
- - 15. The method of claim 8, wherein the transformed residual coefficients are computed for each of the frequency blocks using a Discrete Cosine Transform (DCT) followed by a linear two by two transform on two lowest order DCT coefficients.
  - 16. The method of claim 15, wherein the length of each frequency block is approximately proportional to a number of spectral magnitude parameters within the subframe.
  - 18. The method of claim 8, wherein quantizing the combined residual parameters includes applying vector quantization to all or part of the generalized sum and difference vectors computed from the transformed PRBA vectors and applying vector quantization to all or part of the generalized sum and difference vectors computed from the HOC vectors.
  - 19. The method of claim 18, wherein the frame includes two consecutive subframes from the sequence of subframes.

20. A speech encoder for encoding speech into a frame of bits, the encoder including:
- means for digitizing a speech signal into a sequence of digital speech samples;
  
  means for dividing the digital speech samples into a sequence of subframes, each of the subframes including multiple digital speech samples;
  
  means for estimating a set of speech model parameters for each subframe, wherein the speech model parameters include a set of spectral magnitude parameters that represent spectral magnitude information for the subframe;
  
  means for combining consecutive subframes from the sequence of subframes into a frame;
  
  means for jointly quantizing the spectral magnitude parameters from the consecutive subframes of the frame to produce a set of encoder spectral bits, wherein;
  
  the means for jointly quantizing forms predicted spectral magnitude parameters from quantized spectral magnitude parameters from a previous subframe;
  
  a subframe of the frame includes a number of spectral magnitude parameters that may vary from a number of spectral magnitude parameters in the previous subframe; and
  
  the means for jointly quantizing accounts for any variation between the number of spectral magnitude parameters in the subframe of the frame and the number of spectral magnitude parameters in the previous subframe; and
  
  means for forming a frame of bits including the encoder spectral bits.
- View Dependent Claims (21, 22, 23, 24, 25)
- - 21. The speech encoder of claim 20, wherein the spectral magnitude parameters correspond to a frequency-domain representation of a spectral envelope of the subframe.
  - 22. The speech encoder of claim 20, wherein the number of spectral magnitude parameters in the subframe of the frame may vary from a number of spectral magnitude parameters in a second subframe of the frame;
    - andthe means for jointly quantizing accounts for any variation between the number of spectral magnitude parameters in the subframe of the frame and the number of spectral magnitude parameters in the second subframe of the frame.
  - 23. The speech encoder of claim 22, wherein the means for jointly quantizing accounts for any variation between the number of spectral magnitude parameters in a subframe of the frame and the number of spectral magnitude parameters in a second subframe of the frame by transforming the spectral magnitude parameters for the two subframes to produce one or more output vectors and limiting the number of elements within each output vector that are used in the joint quantization.
  - 24. The speech encoder of claim 20, wherein the means for jointly quantizing accounts for any variation between the number of spectral magnitude parameters in the subframe of the frame and the number of spectral magnitude parameters in the previous subframe by interpolating and resampling spectral magnitude parameters for the previous subframe and using the interpolated and resampled spectral magnitude parameters in forming the predicted spectral magnitude parameters.
  - 25. The speech encoder of claim 20, wherein the means for jointly quantizing accounts for any variation between the number of spectral magnitude parameters in a subframe of the frame and the number of spectral magnitude parameters in a second subframe of the frame by transforming the spectral magnitude parameters for the two subframes to produce one or more output vectors and limiting the number of elements within each output vector that are used in the joint quantization.

26. A method of decoding speech from a frame of bits, the method comprising:
- extracting decoder spectral bits from the frame of bits;
  
  using the decoder spectral bits to jointly reconstruct spectral magnitude parameters for consecutive subframes within a frame of speech, wherein the joint reconstruction includes;
  
  inverse quantizing the decoder spectral bits to reconstruct a set of combined residual parameters for the frame from which separate residual parameters for each of the subframes are computed;
  
  forming predicted spectral magnitude parameters from reconstructed spectral magnitude parameters from a previous subframe; and
  
  adding the separate residual parameters to the predicted spectral magnitude parameters to form the reconstructed spectral magnitude parameters for each subframe within the frame;
  
  whereina subframe of the frame includes a number of spectral magnitude parameters that may vary from a number of spectral magnitude parameters in the previous subframe; and
  
  the joint reconstruction accounts for any variation between the number of spectral magnitude parameters in the subframe of the frame and the number of spectral magnitude parameters in the previous subframe; and
  
  synthesizing digital speech samples for each subframe within the frame of speech using speech model parameters which include some or all of the reconstructed voiced/unvoiced metrics and some or all of the reconstructed spectral magnitude parameters for the subframe.
- View Dependent Claims (27, 28, 29, 30, 31, 33, 34, 35, 36, 37, 38, 41)
- - 27. The method of claim 26, wherein the spectral magnitude parameters correspond to a frequency-domain representation of a spectral envelope of the subframe.
  - 28. The method of claim 26, wherein the number of spectral magnitude parameters in the subframe of the frame may vary from a number of spectral magnitude parameters in a second subframe of the frame;
    - andthe joint reconstruction accounts for any variation between the number of spectral magnitude parameters in the subframe of the frame and the number of spectral magnitude parameters in the second subframe of the frame.
  - 29. The method of claim 28, wherein the joint reconstruction accounts for any variation between the number of spectral magnitude parameters in a subframe of the frame and the number of spectral magnitude parameters in a second subframe of the frame by transforming the spectral magnitude parameters for the two subframes to produce one or more output vectors and limiting the number of elements within each output vector that are used in the joint reconstruction.
  - 30. The method of claim 26, wherein the joint reconstruction accounts for any variation between the number of spectral magnitude parameters in the subframe of the frame and the number of spectral magnitude parameters in the previous subframe by interpolating and resampling spectral magnitude parameters for the previous subframe and using the interpolated and resampled spectral magnitude parameters in forming the predicted spectral magnitude parameters.
  - 31. The method of claim 26, wherein the joint reconstruction accounts for any variation between the number of spectral magnitude parameters in a subframe of the frame and the number of spectral magnitude parameters in a second subframe of the frame by transforming the spectral magnitude parameters for the two subframes to produce one or more output vectors and limiting the number of elements within each output vector that are used in the joint reconstruction.
  - 33. The method of claim 26, or 32, wherein the frame of bits includes other decoder bits in addition to the decoder spectral bits, wherein the other decoder bits are representative of speech model parameters other than the spectral magnitude parameters.
  - 34. The method of claim 33, wherein the speech model parameters include parameters representative of a fundamental frequency and parameters representative of a voicing state.
  - 35. The method of claim 26 or 32, wherein the reconstructed spectral magnitude parameters represent log spectral magnitudes used in a Multi-Band Excitation (MBE) speech model.
  - 36. The method of claim 26 or 32, wherein the frame of bits includes redundant error control bits protecting at least some of the decoder spectral bits.
  - 37. The method of claim 26 or 32, wherein the synthesizing of speech for each subframe includes computing a set of phase parameters from the reconstructed spectral magnitude parameters.
  - 38. The method of claim 26 or 32, wherein the predicted spectral magnitude parameters are formed by applying a gain of less than unity to a linear interpolation of quantized spectral magnitudes from a last subframe of a previous frame.
  - 41. The method of claims 26 or 32, wherein the inverse quantization to reconstruct a set of combined residual parameters for the frame includes using inverse vector quantization applied to one or more vectors.

32. A method of decoding speech from a frame of bits, the method comprising:
- extracting decoder spectral bits from the frame of bits;
  
  using the decoder spectral bits to jointly reconstruct spectral magnitude parameters for consecutive subframes within a frame of speech, wherein the joint reconstruction includes;
  
  inverse quantizing the decoder spectral bits to reconstruct a set of combined residual parameters for the frame from which separate residual parameters for each of the subframes are computed;
  
  forming predicted spectral magnitude parameters from reconstructed spectral magnitude parameters from a previous frame; and
  
  adding the separate residual parameters to the predicted spectral magnitude parameters to form the reconstructed spectral magnitude parameters for each subframe within the frame; and
  
  synthesizing digital speech samples for each subframe within the frame of speech using speech model parameters which include some or all of the reconstructed spectral magnitude parameters for the subframe;
  
  wherein the computing of the separate residual parameters for each subframe from the combined residual parameters for the frame comprises;
  
  dividing each subframe into frequency blocks;
  
  separating the combined residual parameters for the frame into generalized sum and difference vectors representing transformed PRBA vectors combined across the subframes of the frame, and into generalized sum and difference vectors representing HOC vectors for the frequency blocks combined across the subframes of the frame;
  
  computing PRBA vectors for each subframe from the generalized sum and difference vectors representing the transformed PRBA vectors;
  
  computing HOC vectors for each subframe from the generalized sum and difference vectors representing the HOC vectors for each of the frequency blocks;
  
  combining the PRBA vector and the HOC vectors for each of the frequency blocks to form transformed residual coefficients for each of the subframes; and
  
  performing an inverse transformation on the transformed residual coefficients to produce the separate residual parameters for each subframe of the frame.
- View Dependent Claims (39, 40)
- - 39. The method of claim 32, wherein the separate residual parameters are computed from the transformed residual coefficients by performing on each of the frequency blocks an inverse linear two by two transform on the two lowest order transformed residual coefficients within the frequency block and then performing an Inverse Discrete Cosine Transform (IDCT) over all the transformed residual coefficients within the frequency block.
  - 40. The method of claim 39, wherein four of the frequency blocks are used per subframe and wherein the length of each frequency block is approximately proportional to a number of spectral magnitude parameters within the subframe.

42. A decoder for decoding speech from a frame of bits, the decoder including:
- means for extracting decoder spectral bits from the frame of bits;
  
  means for using the decoder spectral bits to jointly reconstruct spectral magnitude parameters for consecutive subframes within a frame of speech, wherein the joint reconstruction includes;
  
  inverse quantizing the decoder spectral bits to reconstruct a set of combined residual parameters for the frame from which separate residual parameters for each of the subframes are computed;
  
  forming predicted spectral magnitude parameters from reconstructed spectral magnitude parameters from a previous subframe; and
  
  adding the separate residual parameters to the predicted spectral magnitude parameters to form the reconstructed spectral magnitude parameters for each subframe within the frame;
  
  whereina subframe of the frame includes a number of spectral magnitude parameters that may vary from a number of spectral magnitude parameters in the previous subframe; and
  
  the joint reconstruction accounts for any variation between the number of spectral magnitude parameters in the subframe of the frame and the number of spectral magnitude parameters in the previous subframe; and
  
  means for synthesizing digital speech samples for each subframe within the frame of speech using speech model parameters which include some or all of the reconstructed spectral magnitude parameters for the subframe.
- View Dependent Claims (43, 44, 45, 46, 47, 48, 49, 50, 52, 53)
- - 43. The method of claim 42, wherein the speech level parameter for each subframe is estimated as a mean of a set of spectral magnitude parameters computed for each subframe plus an offset.
  - 44. The method of claim 43, wherein the spectral magnitude parameters represent log spectral magnitudes estimated for a Multi-Band Excitation (MBE) speech model.
  - 45. The method of claim 43, wherein the offset is dependent on a number of spectral magnitude parameters in the frame.
  - 46. The decoder of claim 42, wherein the spectral magnitude parameters correspond to a frequency-domain representation of a spectral envelope of the subframe.
  - 47. The decoder of claim 42, wherein the number of spectral magnitude parameters in the subframe of the frame may vary from a number of spectral magnitude parameters in a second subframe of the frame;
    - andthe joint reconstruction accounts for any variation between the number of spectral magnitude parameters in the subframe of the frame and the number of spectral magnitude parameters in the second subframe of the frame.
  - 48. The decoder of claim 47, wherein the joint reconstruction accounts for any variation between the number of spectral magnitude parameters in a subframe of the frame and the number of spectral magnitude parameters in a second subframe of the frame by transforming the spectral magnitude parameters for the two subframes to produce one or more output vectors and limiting the number of elements within each output vector that are used in the joint reconstruction.
  - 49. The decoder of claim 42, wherein the joint reconstruction accounts for any variation between the number of spectral magnitude parameters in the subframe of the frame and the number of spectral magnitude parameters in the previous subframe by interpolating and resampling spectral magnitude parameters for the previous subframe and using the interpolated and resampled spectral magnitude parameters in forming the predicted spectral magnitude parameters.
  - 50. The decoder of claim 42, wherein the joint reconstruction accounts for any variation between the number of spectral magnitude parameters in a subframe of the frame and the number of spectral magnitude parameters in a second subframe of the frame by transforming the spectral magnitude parameters for the two subframes to produce one or more output vectors and limiting the number of elements within each output vector that are used in the joint reconstruction.
  - 52. The method of claim 51 or 43, wherein the difference level vector is quantized using vector quantization.
  - 53. The method of claim 51 or 43, wherein the frame of bits includes error control bits used to protect some or all of the quantized bits representative of the average level parameter and the difference level vector.

51. A method of encoding a level of speech into a frame of bits, the method comprising:
- digitizing a speech signal into a sequence of digital speech samples;
  
  dividing the digital speech samples into a sequence of subframes, each of the subframes including multiple digital speech samples;
  
  estimating a speech level parameter for each of the subframes, wherein the speech level parameter is representative of the amplitude of the digital speech samples comprising the subframe;
  
  combining a plurality of consecutive subframes from the sequence of subframes into a frame;
  
  jointly quantizing the speech level parameters from the plurality of consecutive subframes within the frame, characterized in that the joint quantization includes computing and quantizing an average level parameter by combining the speech level parameters over the subframes within the frame, and computing and quantizing a difference level vector between the speech level parameters for each subframe within the frame and the average level parameter; and
  
  including quantized bits representative of the average level parameter and the difference level vector in a frame of bits.
- View Dependent Claims (54)
- - 54. The method of claim 51, wherein the spectral magnitude parameters correspond to a frequency-domain representation of a spectral envelope of the subframe.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Digital Voice Systems, Inc.
Original Assignee
Digital Voice Systems, Inc.
Inventors
Hardwick, John C.
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
ABEBE, DANIEL DEMELASH

Application Number

US08/818,130
Time in Patent Office

1,369 Days
Field of Search

704/203, 704/204, 704/206, 704/208, 704/230, 704/219, 704/222
US Class Current

704/230
CPC Class Codes

G10L 19/02 using spectral analysis, e....

G10L 19/16 Vocoder architecture

Multi-subframe quantization of spectral parameters

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

54 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-subframe quantization of spectral parameters

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

54 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links