Frequency domain interpolative speech codec system
First Claim
1. A frequency domain interpolative coding system for low bit-rate coding of speech signals, comprising:
- a linear prediction (LP) front end responsive to an input signal providing LP parameters which are quantized and encoded over predetermined intervals and used to compute a LP residual signal;
an open loop pitch estimator responsive to said LP residual signal, a pitch quantizer, and a pitch interpolator yielding a pitch contour within the predetermined interval;
a signal processor responsive to said LP residual signal and the pitch contour for extracting a prototype waveform (PW) for a number of equal subintervals within the predetermined interval;
said signal processor computing a PW gain for generating a normalized PW for each sub-interval and a PW gain vector for the predetermined interval;
a low pass filter and a decimator for the PW gain sequence, yielding a decimated PW gain vector;
vector quantizer (VQ) operating on the decimated PW gain vector using a codebook comprising a section representative of steady state gain inputs and a section representative of transient gain inputs.
13 Assignments
0 Petitions
Accused Products
Abstract
Encoding of prototype waveform components applicable to GeoMobile and Telephony Earth Station (TES) providing improved voice quality enabling a dual-channel mode of operation which permits more users to communicate over the same physical channel. A prototype word (PW) gain is vector quantized using a vector quantizer (VQ) that explicitly populates the codebook by representative steady state and transient vectors of PW gain for tracking the abrupt variations in speech levels during onsets and other non-stationary events, while maintaining the accuracy of the speech level during stationary conditions. The rapidly evolving waveform (REW) and slowly evolving waveform (SEW) component vectors are converted to magnitude-phase. The variable dimension SEW magnitude vector is quantized using a hierarchical approach, i.e., a fixed dimension SEW mean vector computed by a sub-band averaging of SEW magnitude spectrum, and only the REW magnitude is explicitly encoded. The REW magnitude vector sequence is normalized to unity RMS value, resulting in a REW magnitude shape vector and a REW gain vector. The normalized REW magnitude vectors are modeled by a multi-band sub-band model which converts the variable dimension REW magnitude shape vectors, e.g., six dimensional REW sub-band vectors. The sub-band vectors are averaged over time, resulting in a single average REW sub-band vector for each frame. At the decoder, the full-dimension REW magnitude shape vector is obtained from the REW sub-band vector by a piecewise-constant construction. The REW phase vector is regenerated at the decoder based on the received REW gain vector and the voicing measure, which determines a weighted mixture of SEW component and a random noise that is passed through a high pass filter to generate the REW component. The high pass filter poles are adjusted based on the voicing measure to control the REW component characteristics. At the output the filter, the magnitude of the REW component is scaled to match the received REW magnitude vector.
140 Citations
13 Claims
-
1. A frequency domain interpolative coding system for low bit-rate coding of speech signals, comprising:
-
a linear prediction (LP) front end responsive to an input signal providing LP parameters which are quantized and encoded over predetermined intervals and used to compute a LP residual signal;
an open loop pitch estimator responsive to said LP residual signal, a pitch quantizer, and a pitch interpolator yielding a pitch contour within the predetermined interval;
a signal processor responsive to said LP residual signal and the pitch contour for extracting a prototype waveform (PW) for a number of equal subintervals within the predetermined interval;
said signal processor computing a PW gain for generating a normalized PW for each sub-interval and a PW gain vector for the predetermined interval;
a low pass filter and a decimator for the PW gain sequence, yielding a decimated PW gain vector;
vector quantizer (VQ) operating on the decimated PW gain vector using a codebook comprising a section representative of steady state gain inputs and a section representative of transient gain inputs. - View Dependent Claims (2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
a voice activity detector (VAD) mechanism responsive to said LP parameters and open loop pitch, generating a VAD flag for each predetermined interval, quantized and transmitted;
a voicing measure, which characterizes the degree of the periodicity of the input signal, responsive to a set of parameters correlated to the degree of signal periodicity, computed using the input signal, LP residual, PW, SEW and REW;
a decoder responsive to said VAD for computing a VAD likelihood measure at the decoder by summing previously received VAD flags;
said decoder determining a degree of bandwidth broadening at the decoder based on said VAD likelihood measure and the received voicing measure; and
a signal processor for providing adaptive bandwidth broadening based on said degree of bandwidth broadening applied to the LP synthesis filter coefficients at the decoder, to mitigate artifacts in the reconstructed signal due to spurious spectral peaks.
-
-
5. A system as recited in claim 4, comprising:
-
a low-pass filter for extracting a slowly evolving waveform (SEW) from the prototype waveform along each pitch harmonic track;
a high-pass filter for extracting a rapidly evolving waveform (REW) from the prototype waveform along each pitch harmonic track; and
vector quantizer for quantizing the SEW spectral magnitude vector using a mean-gain-shape method.
-
-
6. A system as recited in claim 5, comprising a vector quantizer for quantizing the REW spectral magnitude vector using a gain sub-band averaged shape method.
-
7. A system as recited in claim 6, wherein the SEW phase component is reconstructed at the decoder for every sub-interval of a predetermined interval based on the received voicing measure, pitch contour, SEW and REW magnitudes.
-
8. A system as recited in claim 7, wherein the REW phase component is reconstructed at the decoder for every sub-interval as the phase of the complex output of an adaptive filter, driven by a weighted combination of the complex SEW signal and a complex random noise process with the same energy as the SEW.
-
9. A system as recited in claim 8, wherein said decoder generates an excitation signal derived by conversion to time-domain of the gain scaled sum of the reconstructed SEW and REW components;
- and wherein said signal processor reconstructs speech as the output of the adaptively bandwidth broadened LP synthesis filter, driven by said excitation signal, further comprising a filter for postfiltering the reconstructed speech using a global pole-zero postfilter, whose parameters are derived from adaptively bandwidth broadened LP synthesis filter parameters.
-
10. A system as recited in claim 9, wherein said decoder generates an error concealment mechanism for the line spectral frequency (LSF) parameters based on replacing the errored parameters by ones generated using a higher value for the fixed prediction coefficient in the predictive inverse-VQ;
- and provides an error recovery mechanism whereby the LSF parameters of the previous frame are also replaced by an average of the parameters of the current frame and parameters from two frames ago, so that the LSF parameters evolve smoothly.
-
11. A system as recited in claim 10, wherein said decoder generates an error concealment mechanism for the open loop pitch parameter based on repetition of the pitch value of the previous frame;
- and provides an error recovery mechanism based on either repetition or averaging to obtain the pitch value of the previous frame, depending on the number of bad frames that have elapsed.
-
12. A system as recited in claim 11, wherein said decoder generates an error concealment for the PW gain in the coding system by decaying an average measure of PW gain obtained from two or more predetermined intervals, and increasing the rate of decay with the number of erased frames;
- and provides an error recovery mechanism.
-
13. A system as recited in claim 12, wherein said decoder provides an error concealment mechanism for the VAD likelihood measure by setting the VAD flag for the most recently received frames to indicate active speech, thereby reducing the degree of adaptive bandwidth broadening.
-
3. A frequency domain interpolative coding system for low bit-rate coding of speech signals, comprising:
-
a linear prediction (LP) front end responsive to an input signal providing parameters which are quantized using a backward adaptive predictive multi-stage VQ for each predetermined interval and used to compute a LP residual signal;
an open loop pitch estimator responsive to said LP residual signal, a pitch quantizer, and a pitch interpolator yielding a pitch contour within the predetermined interval;
a signal processor responsive to said LP residual signal and the pitch contour for extracting a prototype waveform (PW) for a number of equal sub-intervals within the predetermined interval;
signal processor computing a PW gain for generating a normalized PW for each sub-interval and a PW gain vector for the predetermined interval;
a low pass filter and a decimator for the PW gain sequence, yielding a decimated PW gain vector;
a vector quantizer (VQ) operating on the decimated PW gain vector using a codebook comprising a section representative of steady state gain inputs and a section representative of transient gain inputs.
-
Specification