Methods for speech quantization and error correction
First Claim
1. A method of encoding a speech signal, the method comprising the steps:
- breaking said signal into segments, each of said segments representing one of a succession of time intervals and having a spectrum of frequencies;
for each said segment, sampling said spectrum at a set of frequencies, thereby forming a set of actual spectral amplitudes, wherein the frequencies at which said spectra are sampled generally differ from one segment to the next;
producing predicted spectral amplitudes for said current segment based on the spectral amplitudes for at least one previous segment, wherein said predicted spectral amplitudes for said current segment are based at least in part on interpolating the spectral amplitudes of a previous segment to estimate the spectral amplitudes in the previous segment at the frequencies of said current segment;
producing prediction residuals based on a difference between the actual spectral amplitudes for said current segment and the predicted spectral amplitudes for said current segment; and
producing an encoded speech signal based on said prediction residuals.
2 Assignments
0 Petitions
Accused Products
Abstract
The redundancy contained within the spectral amplitudes is reduced, and as a result the quantization of the spectral amplitudes is improved. The prediction of the spectral amplitudes of the current segment from the spectral amplitudes of the previous is adjusted to account for any change in the fundamental frequency between the two segments. The spectral amplitudes prediction residuals are divided into a fixed number of blocks each containing approximately the same number of elements. A prediction residual block average (PRBA) vector is formed; each element of the PRBA is equal to the average of the prediction residuals within one of the blocks. The PRBA vector is vector quantized, or it is transformed with a Discrete Cosine Transform (DCT) and scalar quantized. The perceived effect of bit errors is reduced by smoothing the voiced/unvoiced decisions. An estimate of the error rate is made by locally averaging the number of correctable bit errors within each segment. If the estimate of the error rate is greater than a threshold, then high energy spectral amplitudes are declared voiced.
67 Citations
25 Claims
-
1. A method of encoding a speech signal, the method comprising the steps:
-
breaking said signal into segments, each of said segments representing one of a succession of time intervals and having a spectrum of frequencies; for each said segment, sampling said spectrum at a set of frequencies, thereby forming a set of actual spectral amplitudes, wherein the frequencies at which said spectra are sampled generally differ from one segment to the next; producing predicted spectral amplitudes for said current segment based on the spectral amplitudes for at least one previous segment, wherein said predicted spectral amplitudes for said current segment are based at least in part on interpolating the spectral amplitudes of a previous segment to estimate the spectral amplitudes in the previous segment at the frequencies of said current segment; producing prediction residuals based on a difference between the actual spectral amplitudes for said current segment and the predicted spectral amplitudes for said current segment; and producing an encoded speech signal based on said prediction residuals. - View Dependent Claims (2, 17, 18, 19, 20, 21)
-
-
3. A method of encoding a speech signal, the method comprising the steps:
-
breaking said signal into segments, each of said segments representing one of a succession of time intervals and having a spectrum of frequencies; for each said segment, sampling said spectrum at a set of frequencies, thereby forming a set of spectral amplitudes, wherein the frequencies at which said spectra are sampled generally differ from one segment to the next; producing predicted spectral amplitudes for said current segment based on the spectral amplitudes for at least one previous segment; producing prediction residuals based on a difference between the actual spectral amplitudes for said current segment and the predicted spectral amplitudes for said current segment; grouping said prediction residuals into a predetermined number of blocks, the number of blocks being independent of the number of said prediction residuals grouped into particular blocks; and producing an encoded speech signal based on said blocks. - View Dependent Claims (4, 14, 15, 16)
-
-
5. A method of encoding a speech signal, the method comprising the steps:
-
breaking said signal into segments, each of said segments representing one of a succession of time intervals and having a spectrum of frequencies; for each said segment, sampling said spectrum at a set of frequencies, thereby forming a set of spectral amplitudes, wherein the frequencies at which the spectra are sampled generally differ from one segment to the next; producing predicted spectral amplitudes for said current segment based on the spectral amplitudes for at least one previous segment; producing prediction residuals based on a difference between the actual spectral amplitudes for said current segment and the predicted spectral amplitudes for said current segment; grouping said prediction residuals into blocks; forming a prediction residual block average (PRBA) vector, each value of said PRBA vector being an average of the prediction residuals of a corresponding block; and producing an encoded speech signal based on said PRBA. - View Dependent Claims (6, 7, 8, 9, 10, 11, 12, 13)
-
-
22. A method of synthesizing speech from a received bit stream, said bit stream representing speech segments and having bit errors, the method comprising the steps of:
-
estimating a bit error rate for each speech segment; for each said speech segment, deciding whether to decode said speech segment as a voided or an unvoiced speech segment; smoothing the voice/unvoiced decisions based on the estimate of the bit error rate for said speech segment; and synthesizing a speech signal using said smoothed voiced/unvoiced decisions. - View Dependent Claims (24, 25)
-
-
23. A method of synthesizing speech from a received bit stream, said bit stream representing frequency bands of speech segments and having bit errors, the method comprising the steps of:
-
estimating a bit error rate for each speech segment; for each frequency band of each said speech segment, deciding whether to decode the frequency band of said speech segment as voiced or unvoiced; smoothing the voiced/unvoiced decisions based on the estimated of the bit error rate for said speech segment; and synthesizing a speech signal using said method voiced/unvoiced decisions.
-
Specification