Joint quantization of speech subframe voicing metrics and fundamental frequencies
First Claim
1. A method of encoding speech into a frame of bits, the method comprising:
- digitizing a speech signal into a sequence of digital speech samples;
dividing the digital speech samples into a sequence of subframes, each of the subframes including multiple digital speech samples;
estimating a fundamental frequency parameter for each subframe;
designating subframes from the sequence of subframes as corresponding to a frame;
jointly quantizing fundamental frequency parameters from subframes of the frame to produce a set of encoder fundamental frequency bits; and
including the encoder fundamental frequency bits in a frame of bits, wherein the joint quantization comprises;
computing fundamental frequency residual parameters as a difference between a transformed average of the fundamental frequency parameters and each fundamental frequency parameter;
combining the residual fundamental frequency parameters from the subframes of the frame; and
quantizing the combined residual parameters.
1 Assignment
0 Petitions
Accused Products
Abstract
Speech is encoded into a frame of bits. A speech signal is digitized into a sequence of digital speech samples that are then divided into a sequence of subframes. A set of model parameters is estimated for each subframe. The model parameters include a set of voicing metrics that represent voicing information for the subframe. Two or more subframes from the sequence of subframes are designated as corresponding to a frame. The voicing metrics from the subframes within the frame are jointly quantized. The joint quantization includes forming predicted voicing information from the quantized voicing information from the previous frame, computing the residual parameters as the difference between the voicing information and the predicted voicing information, combining the residual parameters from both of the subframes within the frame, and quantizing the combined residual parameters into a set of encoded voicing information bits which are included in the frame of bits. A similar technique is used to encode fundamental frequency information.
-
Citations
30 Claims
-
1. A method of encoding speech into a frame of bits, the method comprising:
-
digitizing a speech signal into a sequence of digital speech samples;
dividing the digital speech samples into a sequence of subframes, each of the subframes including multiple digital speech samples;
estimating a fundamental frequency parameter for each subframe;
designating subframes from the sequence of subframes as corresponding to a frame;
jointly quantizing fundamental frequency parameters from subframes of the frame to produce a set of encoder fundamental frequency bits; and
including the encoder fundamental frequency bits in a frame of bits, wherein the joint quantization comprises;
computing fundamental frequency residual parameters as a difference between a transformed average of the fundamental frequency parameters and each fundamental frequency parameter;
combining the residual fundamental frequency parameters from the subframes of the frame; and
quantizing the combined residual parameters. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method of encoding speech into a frame of bits, the method comprising:
-
digitizing a speech signal into a sequence of digital speech samples;
estimating a set of voicing metrics parameters for a group of digital speech samples, the set including multiple voicing metrics parameters;
jointly quantizing the voicing metrics parameters to produce a set of encoder voicing metrics bits; and
including the encoder voicing metrics bits in a frame of bits. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
dividing the digital speech samples into a sequence of subframes, each of the subframes including multiple digital speech samples; and
designating subframes from the sequence of subframes as corresponding to a frame;
wherein the group of digital speech samples corresponds to the subframes corresponding to the frame.
-
-
8. The method of claim 7, wherein jointly quantizing multiple voicing metrics parameters comprises jointly quantizing at least one voicing metrics parameter for each of multiple subframes.
-
9. The method of claim 7, wherein jointly quantizing multiple voicing metrics parameters comprises jointly quantizing multiple voicing metrics parameters for a single subframe.
-
10. The method of claim 6, wherein the joint quantization comprises:
-
computing voicing metrics residual parameters as the transformed ratios of voicing error vectors and voicing energy vectors;
combining the residual voicing metrics parameters; and
quantizing the combined residual parameters.
-
-
11. The method of claim 10, wherein combining the residual parameters includes performing a linear transformation on the residual parameters to produce a set of transformed residual coefficients for each subframe.
-
12. The method of claim 10, wherein quantizing the combined residual parameters includes using at least one vector quantizer.
-
13. The method of claim 6, wherein the frame of bits includes redundant error control bits protecting at least some of the encoder voicing metrics bits.
-
14. The method of claim 6, wherein voicing metrics parameters represent voicing states estimated for a Multi-Band Excitation (MBE) speech model.
-
15. The method of claim 6, further comprising producing additional encoder bits by quantizing additional speech model parameters other than the voicing metrics parameters and including the additional encoder bits in the frame of bits.
-
16. The method of claim 15, wherein the additional speech model parameters include parameters representative of spectral magnitudes.
-
17. The method of claim 15, wherein the additional speech model parameters include parameters representative of a fundamental frequency.
-
18. The method of claim 17, wherein the additional speech model parameters include parameters representative of the spectral magnitudes.
-
19. A method of encoding speech into a frame of bits, the method comprising:
-
digitizing a speech signal into a sequence of digital speech samples;
dividing the digital speech samples into a sequence of subframes, each of the subframes including multiple digital speech samples;
estimating a fundamental frequency parameter for each subframe;
designating subframes from the sequence of subframes as corresponding to a frame;
quantizing a fundamental frequency parameter from one subframe of the frame;
interpolating a fundamental frequency parameter for another subframe of the frame using the quantized fundamental frequency parameter from the one subframe of the frame;
combining the quantized fundamental frequency parameter and the interpolated fundamental frequency parameter to produce a set of encoder fundamental frequency bits; and
including the encoder fundamental frequency bits in a frame of bits.
-
-
20. A speech encoder for encoding speech into a frame of bits, the encoder comprising:
-
means for digitizing a speech signal into a sequence of digital speech samples;
means for estimating a set of voicing metrics parameters for a group of digital speech samples, the set including multiple voicing metrics parameters;
means for jointly quantizing the voicing metrics parameters to produce a set of encoder voicing metrics bits; and
means for forming a frame of bits including the encoder voicing metrics bits. - View Dependent Claims (21, 22, 23)
means for dividing the digital speech samples into a sequence of subframes, each of the subframes including multiple digital speech samples; and
means for designating subframes from the sequence of subframes as corresponding to a frame;
wherein the group of digital speech samples corresponds to the subframes corresponding to the frame.
-
-
22. The speech encoder of claim 21, wherein the means for jointly quantizing multiple voicing metrics parameters jointly quantizes at least one voicing metrics parameter for each of multiple subframes.
-
23. The speech encoder of claim 21, wherein the means for jointly quantizing multiple voicing metrics parameters jointly quantizes multiple voicing metrics parameters for a single subframe.
-
24. A method of decoding speech from a frame of bits that has been encoded by digitizing a speech signal into a sequence of digital speech samples, estimating a set of voicing metrics parameters for a group of digital speech samples, the set including multiple voicing metrics parameters, jointly quantizing the voicing metrics parameters to produce a set of encoder voicing metrics bits, and including the encoder voicing metrics bits in a frame of bits, the method of decoding speech comprising:
-
extracting decoder voicing metrics bits from the frame of bits;
jointly reconstructing voicing metrics parameters using the decoder voicing metrics bits; and
synthesizing digital speech samples using speech model parameters which include some or all of the reconstructed voicing metrics parameters. - View Dependent Claims (25, 26)
inverse quantizing the decoder voicing metrics bits to reconstruct a set of combined residual parameters for the frame;
computing separate residual parameters for each subframe from the combined residual parameters; and
forming the voicing metrics parameters from the voicing metrics bits.
-
-
26. The method of claim 25, wherein the computing of the separate residual parameters for each subframe comprises:
-
separating the voicing metrics residual parameters for the frame from the combined residual parameters for the frame; and
performing an inverse transformation on the voicing metrics residual parameters for the frame to produce the separate residual parameters for each subframe of the frame.
-
-
27. A decoder for decoding speech from a frame of bits that has been encoded by digitizing a speech signal into a sequence of digital speech samples, estimating a set of voicing metrics parameters for a group of digital speech samples, the set including multiple voicing metrics parameters, jointly quantizing the voicing metrics parameters to produce a set of encoder voicing metrics bits, and including the encoder voicing metrics bits in a frame of bits, the decoder comprising:
-
means for extracting decoder voicing metrics bits from the frame of bits;
means for jointly reconstructing voicing metrics parameters using the decoder voicing metrics bits; and
means for synthesizing digital speech samples using speech model parameters which include some or all of the reconstructed voicing metrics parameters.
-
-
28. Software on a processor readable medium comprising instructions for causing a processor to perform the following operations:
-
estimate a set of voicing metrics parameters for a group of digital speech samples, the set including multiple voicing metrics parameters;
jointly quantize the voicing metrics parameters to produce a set of encoder voicing metrics bits; and
form a frame of bits including the encoder voicing metrics bits. - View Dependent Claims (29)
-
-
30. A communications system comprising:
-
a transmitter configured to;
digitize a speech signal into a sequence of digital speech samples;
estimate a set of voicing metrics parameters for a group of digital speech samples, the set including multiple voicing metrics parameters;
jointly quantize the voicing metrics parameters to produce a set of encoder voicing metrics bits;
form a frame of bits including the encoder voicing metrics bits; and
transmit the frame of bits, and a receiver configured to receive and process the frame of bits to produce a speech signal.
-
Specification