Multi-rate frequency domain interpolative speech CODEC system
First Claim
1. A coding system for a coder/decoder (codec) for providing adaptive bandwidth broadening to an encoder, comprising:
- a linear prediction (LP) front end, adapted to process an input signal which provides LP parameters that are computed during a predetermined interval;
an open loop pitch estimator, adapted to perform pitch frequency estimation on said input signal for substantially all of said predetermined intervals;
an adaptive bandwidth broadening module, adapted to perform the following operations;
derive a spectrum sampling frequency for said predetermined interval as the pitch frequency or its integer submultiple depending on the pitch frequency;
determine a LP power spectrum at the harmonics of said spectrum sampling frequency for said input signal for said frame;
compute a peak to average ratio of said LP spectrum based on said spectrum sampling frequency of said frame; and
adaptively bandwidth broaden said LP filter coefficients based on said peak to average ratio of said LP spectrum for all harmonic multiples of said spectral sampling frequency.
7 Assignments
0 Petitions
Accused Products
Abstract
A low bit rate voice codec based on Frequency Domain Interpolation (FDI) technology is designed to operate at multiple rates of 4.0, 2.4, and 1.2 Kbps. At 4 Kbps, the codec uses a 20 ms frame size and a 20 ms lookahead for purposes of voice activity detection (VAD), noise reduction, linear prediction (LP) analysis, and open loop pitch analysis. The LP parameters are encoded using backward predictive hybrid scalar-vector quantizers in the line spectral frequency (LSF) domain after adaptive bandwidth broadening to minimize excessive peakiness in the LP spectrum. Prototype Waveforms (PW) are extracted every subframe or 2.5 ms from the LP residual and subsequently aligned and normalized. The PW gains are encoded separately using a backward predictive vector quantizer (VQ). The normalized and aligned PWs are separated into a magnitude component and a phase component. The phase component is encoded implicitly using PW correlations and a voicing measure which are jointly quantized using a VQ. The magnitude component is encoded using a switched (based on voicing measure) backward predictive VQ. At the decoder, a phase model is used to synthesize the phase component from the received PW correlations and voicing measure. The phase component is generated based on a first order vector autoregressive model in which each PW vector is generated by summing the previous PW vector weighted by the decoded PW correlation coefficient with a weighted combination of a fixed and random phase components. The use of the PW correlations in this manner results in a sequence of PWs that exhibit the correlation characteristics measured at the encoder. The fixed phase component, obtained from a pitch pulse waveform, provides glottal pulse like characteristics to the resulting phase during voiced segments. Addition of the random phase component provides a means of inserting a controlled degree of variation in the PW sequence across frequency as well as across time. The phase of the resulting PW sequence is then combined with the decoded PW magnitude and scaled by the decoded PW gains to reconstruct the PWs at all the subframes. The LP residual is then synthesized from these PWs using an interpolative synthesis procedure. Speech is then obtained as the output of the decoded LP synthesis filter driven by the LP residual. The synthesized speech is postfiltered using a pole-zero filter followed by tilt correction and energy normalization. At 2.4 Kbps, the same frame size of 20 ms and a lookahead of 20 ms for VAD, noise reduction, LP analysis, and pitch estimation are utilized. However, the LP parameters are encoded using a 3-stage 21 bit VQ with backward prediction. Furthermore, for encoding the PW parameters an additional 20 ms of lookahead is employed to smooth the PW gains, correlations, voicing measure, and magnitude spectra so that they can be encoded using fewer bits. The 1.2 Kbps FDI codec is similar to the 2.4 Kbps FDI codec except that a 40 ms frame size is employed instead of the 20 ms frame size with the result that all parameters are updated half as often as the 2.4 Kbps FDI codec.
218 Citations
50 Claims
-
1. A coding system for a coder/decoder (codec) for providing adaptive bandwidth broadening to an encoder, comprising:
-
a linear prediction (LP) front end, adapted to process an input signal which provides LP parameters that are computed during a predetermined interval;
an open loop pitch estimator, adapted to perform pitch frequency estimation on said input signal for substantially all of said predetermined intervals;
an adaptive bandwidth broadening module, adapted to perform the following operations;
derive a spectrum sampling frequency for said predetermined interval as the pitch frequency or its integer submultiple depending on the pitch frequency;
determine a LP power spectrum at the harmonics of said spectrum sampling frequency for said input signal for said frame;
compute a peak to average ratio of said LP spectrum based on said spectrum sampling frequency of said frame; and
adaptively bandwidth broaden said LP filter coefficients based on said peak to average ratio of said LP spectrum for all harmonic multiples of said spectral sampling frequency. - View Dependent Claims (2, 3, 4)
-
-
5. A coding system for a codec, comprising:
-
A linear prediction front end adapted to process an input signal to provide LP parameters which are quantized and encoded over predetermined intervals and are used to compute a LP residual signal;
an open loop pitch estimator adapted to process the LP residual signal, pitch information, pitch interpolation information and provide a pitch contour within the predetermined intervals;
a prototype waveform extraction module, which is adapted in response to the LP residual signal and the pitch contour to extract a prototype waveform (PW) for a number of equal subintervals within the predetermined intervals and to extract an additional approximate PW in the subinterval immediately after the ending of a previous subinterval;
a PW gain computation module, adapted to compute a PW gain for substantially all the subintervals; and
a gain vector predictive vector quantization (VQ) module, adapted to quantize and encode the PW gains for substantially all the subintervals after they are filtered by a weighted window, decimated, and after subtracting from them a predicted average PW gain value for a current predetermined interval computed from the quantized PW gain values of a preceding predetermined interval. - View Dependent Claims (6, 7, 8, 9, 10)
-
-
11. A frequency domain interpolative (FDI) coder/decoder (codec), comprising:
-
a PW normalization and alignment module, adapted to compute a sequence of aligned prototype waveform (PW) vectors for a frame via a low complexity alignment process; and
a PW subband correlation computation module, adapted to compute a PW correlation vector for all harmonics for the frame and average the PW correlation vector across the harmonics in five subbands in order to derive a PW subband correlation vector. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A frequency domain interpolative (FDI) coder/decoder (codec), comprising:
a PW magnitude quantizer, adapted to perform the following;
directly quantize a prototype waveform (PW) in a magnitude domain for substantially every frame without said PW being decomposed into complex components;
hierarchically quantize a PW magnitude vector based on a voicing classification using a mean-deviations representation;
adaptively vector quantize the mean component of the representation in multiple subbands;
derive a variable dimension deviations vector as the difference of the input PW magnitude vector and the full band representation of the quantized PW subband mean vector for all harmonics;
select a fixed dimensional deviations subvector from the said variable dimensional deviations vector based on location of speech formant frequencies for a subframe; and
provide the said fixed dimensional deviations subvector for adaptive vector quantization.
-
22. A coding system for a coder/decoder (codec), comprising:
-
a linear prediction (LP) front end, adapted to process an input signal which provides LP parameters that are computed during a predetermined interval;
an open loop pitch estimator, adapted to perform pitch estimation on said input signal for substantially all of said predetermined intervals;
a voice activity detection module, that uses the LP parameters and pitch information;
a voicing measure computation module, adapted to provide a voicing measure that characterizes a degree of voicing and is derived from a plurality of input parameters that are correlated to the degree of periodicity of the input signal for substantially all predetermined intervals;
a prototype waveform (PW) subband correlation computation module, adapted to provide a PW subband correlation vector, said PW subband correlation vector characterizing a degree of correlation between successive PW vectors as a function of frequency and computed for substantially all predetermined intervals;
an adaptive bandwidth broadening module, adapted to reduce annoying artifacts due to spurious spectral peaks by performing the following;
compute a measure of VAD likelihood based on voice activity detection (VAD) flags for a preceding, a current and a next predetermined interval; and
compute average PW gain values for inactive predetermined intervals and active unvoiced predetermined intervals. - View Dependent Claims (23, 24, 25, 26)
-
-
27. A low bit rate coding system for a coder/decoder (codec), comprising:
-
a linear prediction (LP) front end, adapted to process an input signal which provides LP parameters that are computed during a predetermined interval;
an open loop pitch estimator, adapted to perform pitch estimation on said input signal for substantially all of said predetermined intervals;
a voice activity detection module, adapted to process and provide the LP parameters and pitch information to the decoder;
a prototype waveform (PW) encoder, adapted to provide a look ahead based on said predetermined interval in order to smooth PW parameters; and
a voicing measure computation module, adapted to provide a voicing measure, said voicing measure characterizing a degree of voicing derived from a plurality of input parameters that are correlated to the degree of periodicity of the input signal for substantially all predetermined intervals. - View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35)
-
-
36. A low bit rate coding system for a coder/decoder (codec), comprising:
-
a linear prediction (LP) front end, adapted to process an input signal which provides LP parameters that are estimated, quantized and transmitted for substantially all frames of a first duration;
an open loop pitch estimator, adapted to perform pitch estimation on said input signal for substantially all of said frames of a first duration and quantize and transmit pitch information for substantially all frames of a second duration;
a voice activity detection module, adapted to combine voice activity detection (VAD) flags associated with two successive frames of a first duration based on processing the LP parameters and the pitch information every frame of a first duration and transmitting the VAD flags to the decoder substantially every frame of a second duration; and
a prototype waveform (PW) encoder, adapted to provide a look ahead frame based on said frame of a first duration in order to smooth PW parameters including at least one of PW gain, a voicing measure, subband correlations and spectral magnitude. - View Dependent Claims (37, 38, 39, 40, 41, 42, 43)
-
-
44. A method for providing adaptive bandwidth broadening to an encoder of a coder/decoder (codec), comprising:
-
processing an input signal which provides LP parameters that are computed during a predetermined interval;
performing pitch frequency estimation on said input signal for substantially all of said predetermined intervals;
deriving a spectrum sampling frequency for said predetermined interval as the pitch frequency or its integer submultiple depending on the pitch frequency;
determining a LP power spectrum at the harmonics of said spectrum sampling frequency for said input signal for said frame;
computing a peak to average ratio of said LP spectrum based on said spectrum sampling frequency of said frame; and
adaptively bandwidth broadening said LP filter coefficients based on said peak to average ratio of said LP spectrum for all harmonic multiples of said spectral sampling frequency.
-
-
45. A method of providing a coding system for a codec, comprising:
-
processing an input signal to provide LP parameters which are quantized and encoded over predetermined intervals and are used to compute a LP residual signal;
processing the LP residual signal, pitch information, pitch interpolation information and providing a pitch contour within the predetermined intervals;
extracting a prototype waveform (PW) for a number of equal subintervals within the predetermined intervals and extracting an additional approximate PW in the subinterval immediately after the ending of a previous subinterval in response to the LP residual signal and the pitch contour;
computing a PW gain for substantially all the subintervals; and
quantizing and encoding the PW gains for substantially all the subintervals after the subintervals are filtered by a weighted window, decimated, and subtracted from a predicted average PW gain value for a current predetermined interval which is computed from the quantized PW gain values of a preceding predetermined interval.
-
-
46. A method of providing a coding system for a coder/decoder (codec), comprising:
-
computing a sequence of aligned prototype waveform (PW) vectors for a frame via a low complexity alignment process; and
computing a PW correlation vector for all harmonics for the frame and averaging the PW correlation vector across the harmonics in five subbands in order to derive a PW subband correlation vector.
-
-
47. A method of providing a coding system for a frequency domain interpolative (FDI) coder/decoder (codec), comprising:
-
directly quantizing a prototype waveform (PW) in a magnitude domain for substantially every frame without said PW being decomposed into complex components;
hierarchically quantizing a PW magnitude vector based on a voicing classification using a mean-deviations representation;
adaptively vector quantizing the mean component of the representation in multiple subbands;
deriving a variable dimension deviations vector as the difference of the input PW magnitude vector and the full band representation of the quantized PW subband mean vector for all harmonics;
selecting a fixed dimensional deviations subvector from the said variable dimensional deviations vector based on a location of speech formant frequencies for a subframe; and
providing the said fixed dimensional deviations subvector for adaptive vector quantization.
-
-
48. A method of providing a coding system for a coder/decoder (codec), comprising:
-
processing an input signal which provides LP parameters that are computed during a predetermined interval;
performing a pitch estimation on said input signal for substantially all of said predetermined intervals;
processing the LP parameters and pitch information;
providing a voicing measure that characterizes a degree of voicing and is derived from a plurality of input parameters that are correlated to the degree of periodicity of the input signal for substantially all predetermined intervals;
providing a PW subband correlation vector, said PW subband correlation vector characterizing a degree of correlation between successive PW vectors as a function of frequency and computed for substantially all predetermined intervals;
reducing annoying artifacts due to spurious spectral peaks by performing the following;
computing a measure of VAD likelihood based on voice activity detection (VAD) flags for a preceding, a current and a next predetermined interval; and
computing average PW gain values for inactive predetermined intervals and active unvoiced predetermined intervals.
-
-
49. A method of providing a low bit rate coding system for a coder/decoder (codec), comprising:
-
processing an input signal which provides LP parameters that are computed during a predetermined interval;
performing pitch estimation on said input signal for substantially all of said predetermined intervals;
processing the LP parameters and pitch information to the decoder;
providing a look ahead based on said predetermined interval in order to smooth PW parameters; and
providing a voicing measure, said voicing measure characterizing a degree of voicing derived from a plurality of input parameters that are correlated to the degree of periodicity of the input signal for substantially all predetermined intervals.
-
-
50. A method of providing a low bit rate coding system for a coder/decoder (codec), comprising:
-
processing an input signal which provides LP parameters that are estimated, quantized and transmitted for substantially all frames of a first duration;
performing a pitch estimation on said input signal for substantially all of said frames of a first duration and quantizing and transmiting pitch information for substantially all frames of a second duration;
combining voice activity detection (VAD) flags associated with two successive frames of a first duration;
processing the LP parameters and the pitch information every frame of a first duration and transmitting the VAD flags to the decoder substantially every frame of a second duration; and
providing a look ahead frame based on said frame of a first duration in order to smooth PW parameters including at least one of PW gain, a voicing measure, subband correlations and a spectral magnitude.
-
Specification