Multi-rate frequency domain interpolative speech CODEC system

US 20040002856A1
Filed: 03/05/2003
Published: 01/01/2004
Est. Priority Date: 03/08/2002
Status: Abandoned Application

First Claim

Patent Images

1. A coding system for a coder/decoder (codec) for providing adaptive bandwidth broadening to an encoder, comprising:

a linear prediction (LP) front end, adapted to process an input signal which provides LP parameters that are computed during a predetermined interval;

an open loop pitch estimator, adapted to perform pitch frequency estimation on said input signal for substantially all of said predetermined intervals;

an adaptive bandwidth broadening module, adapted to perform the following operations;

derive a spectrum sampling frequency for said predetermined interval as the pitch frequency or its integer submultiple depending on the pitch frequency;

determine a LP power spectrum at the harmonics of said spectrum sampling frequency for said input signal for said frame;

compute a peak to average ratio of said LP spectrum based on said spectrum sampling frequency of said frame; and

adaptively bandwidth broaden said LP filter coefficients based on said peak to average ratio of said LP spectrum for all harmonic multiples of said spectral sampling frequency.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A low bit rate voice codec based on Frequency Domain Interpolation (FDI) technology is designed to operate at multiple rates of 4.0, 2.4, and 1.2 Kbps. At 4 Kbps, the codec uses a 20 ms frame size and a 20 ms lookahead for purposes of voice activity detection (VAD), noise reduction, linear prediction (LP) analysis, and open loop pitch analysis. The LP parameters are encoded using backward predictive hybrid scalar-vector quantizers in the line spectral frequency (LSF) domain after adaptive bandwidth broadening to minimize excessive peakiness in the LP spectrum. Prototype Waveforms (PW) are extracted every subframe or 2.5 ms from the LP residual and subsequently aligned and normalized. The PW gains are encoded separately using a backward predictive vector quantizer (VQ). The normalized and aligned PWs are separated into a magnitude component and a phase component. The phase component is encoded implicitly using PW correlations and a voicing measure which are jointly quantized using a VQ. The magnitude component is encoded using a switched (based on voicing measure) backward predictive VQ. At the decoder, a phase model is used to synthesize the phase component from the received PW correlations and voicing measure. The phase component is generated based on a first order vector autoregressive model in which each PW vector is generated by summing the previous PW vector weighted by the decoded PW correlation coefficient with a weighted combination of a fixed and random phase components. The use of the PW correlations in this manner results in a sequence of PWs that exhibit the correlation characteristics measured at the encoder. The fixed phase component, obtained from a pitch pulse waveform, provides glottal pulse like characteristics to the resulting phase during voiced segments. Addition of the random phase component provides a means of inserting a controlled degree of variation in the PW sequence across frequency as well as across time. The phase of the resulting PW sequence is then combined with the decoded PW magnitude and scaled by the decoded PW gains to reconstruct the PWs at all the subframes. The LP residual is then synthesized from these PWs using an interpolative synthesis procedure. Speech is then obtained as the output of the decoded LP synthesis filter driven by the LP residual. The synthesized speech is postfiltered using a pole-zero filter followed by tilt correction and energy normalization. At 2.4 Kbps, the same frame size of 20 ms and a lookahead of 20 ms for VAD, noise reduction, LP analysis, and pitch estimation are utilized. However, the LP parameters are encoded using a 3-stage 21 bit VQ with backward prediction. Furthermore, for encoding the PW parameters an additional 20 ms of lookahead is employed to smooth the PW gains, correlations, voicing measure, and magnitude spectra so that they can be encoded using fewer bits. The 1.2 Kbps FDI codec is similar to the 2.4 Kbps FDI codec except that a 40 ms frame size is employed instead of the 20 ms frame size with the result that all parameters are updated half as often as the 2.4 Kbps FDI codec.

218 Citations

50 Claims

1. A coding system for a coder/decoder (codec) for providing adaptive bandwidth broadening to an encoder, comprising:
- a linear prediction (LP) front end, adapted to process an input signal which provides LP parameters that are computed during a predetermined interval;
  
  an open loop pitch estimator, adapted to perform pitch frequency estimation on said input signal for substantially all of said predetermined intervals;
  
  an adaptive bandwidth broadening module, adapted to perform the following operations;
  
  derive a spectrum sampling frequency for said predetermined interval as the pitch frequency or its integer submultiple depending on the pitch frequency;
  
  determine a LP power spectrum at the harmonics of said spectrum sampling frequency for said input signal for said frame;
  
  compute a peak to average ratio of said LP spectrum based on said spectrum sampling frequency of said frame; and
  
  adaptively bandwidth broaden said LP filter coefficients based on said peak to average ratio of said LP spectrum for all harmonic multiples of said spectral sampling frequency.
- View Dependent Claims (2, 3, 4)
- - 2. A system as recited in claim 1, wherein said predetermined interval is preferably 20 ms in duration.
  - 3. A system as recited in claim 1, wherein said codec comprises a frequency domain interpolative (FDI) codec.
  - 4. A system as recited in claim 1, wherein said harmonic multiples of the spectrum sampling frequency are within 0 to 4 kHz.

5. A coding system for a codec, comprising:
- A linear prediction front end adapted to process an input signal to provide LP parameters which are quantized and encoded over predetermined intervals and are used to compute a LP residual signal;
  
  an open loop pitch estimator adapted to process the LP residual signal, pitch information, pitch interpolation information and provide a pitch contour within the predetermined intervals;
  
  a prototype waveform extraction module, which is adapted in response to the LP residual signal and the pitch contour to extract a prototype waveform (PW) for a number of equal subintervals within the predetermined intervals and to extract an additional approximate PW in the subinterval immediately after the ending of a previous subinterval;
  
  a PW gain computation module, adapted to compute a PW gain for substantially all the subintervals; and
  
  a gain vector predictive vector quantization (VQ) module, adapted to quantize and encode the PW gains for substantially all the subintervals after they are filtered by a weighted window, decimated, and after subtracting from them a predicted average PW gain value for a current predetermined interval computed from the quantized PW gain values of a preceding predetermined interval.
- View Dependent Claims (6, 7, 8, 9, 10)
- - 6. A system as recited in claim 5, wherein said predetermined interval is preferably 20 ms in duration.
  - 7. A system as recited in claim 5, wherein said weighted window comprises a 3 point window.
  - 8. A system as recited in claim 5, wherein said decimation comprises a 2:
    - 1 decimation.
  - 9. A system as recited in claim 5, wherein said gain vector predictive VQ module is further adapted to perform predictive vector quantization of the decimated and smoothed PW gains based on the predicted average PW gain estimate and a codebook indicating corrections to the estimated PW gains.
  - 10. A system as recited in claim 5, further comprising:
    - a gain decoder interpolation module, adapted to decay the average PW gain value for the preceding predetermined interval in order to mitigate the effect of transmission errors on the PW gain parameter.

11. A frequency domain interpolative (FDI) coder/decoder (codec), comprising:
- a PW normalization and alignment module, adapted to compute a sequence of aligned prototype waveform (PW) vectors for a frame via a low complexity alignment process; and
  
  a PW subband correlation computation module, adapted to compute a PW correlation vector for all harmonics for the frame and average the PW correlation vector across the harmonics in five subbands in order to derive a PW subband correlation vector.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. A system as recited in claim 11, further comprising:
    - a voicing measure computation module, adapted to provide a voicing measure that characterizes a degree of voicing.
  - 13. A system as recited in claim 12, wherein said voicing measure is derived from input factors that are correlated to a degree of periodicity for the frame.
  - 14. A system as recited in claim 11, wherein said PW correlation vector comprises the average correlation between successive PW vectors as a function of frequency.
  - 15. A system as recited in claim 11, wherein said PW subband correlation vector comprises a degree of stationarity of successive pitch cycles of an input signal.
  - 16. A system as recited in claim 12 further comprising:
    - a PW correlation and vector measure vector quantization (VQ) module, adapted to encode a composite vector derived from said PW subband correlation vector and the voicing measure based on spectrally weighted vector quantization.
  - 17. A system as recited in claim 11, further comprising:
    - an autoregressive module, adapted to reconstruct a PW phase at the decoder substantially every sub-frame using the received voicing measure, PW subband correlation vector and pitch frequency contour information.
  - 18. A system as recited in claim 17, wherein said autoregressive module is further adapted to compute a value for the input signal via a weighted combination of a first complex vector and a second complex vector.
  - 19. A system as recited in claim 18, wherein said first complex vector is derived from a random phase vector and said second complex vector is derived from a fixed phasevector.
  - 20. A system as recited in claim 19, wherein said second complex vector is obtained by oversampling a phase spectrum of a voiced pitch pulse.

21. A frequency domain interpolative (FDI) coder/decoder (codec), comprising:
- a PW magnitude quantizer, adapted to perform the following;
  
  directly quantize a prototype waveform (PW) in a magnitude domain for substantially every frame without said PW being decomposed into complex components;
  
  hierarchically quantize a PW magnitude vector based on a voicing classification using a mean-deviations representation;
  
  adaptively vector quantize the mean component of the representation in multiple subbands;
  
  derive a variable dimension deviations vector as the difference of the input PW magnitude vector and the full band representation of the quantized PW subband mean vector for all harmonics;
  
  select a fixed dimensional deviations subvector from the said variable dimensional deviations vector based on location of speech formant frequencies for a subframe; and
  
  provide the said fixed dimensional deviations subvector for adaptive vector quantization.

22. A coding system for a coder/decoder (codec), comprising:
- a linear prediction (LP) front end, adapted to process an input signal which provides LP parameters that are computed during a predetermined interval;
  
  an open loop pitch estimator, adapted to perform pitch estimation on said input signal for substantially all of said predetermined intervals;
  
  a voice activity detection module, that uses the LP parameters and pitch information;
  
  a voicing measure computation module, adapted to provide a voicing measure that characterizes a degree of voicing and is derived from a plurality of input parameters that are correlated to the degree of periodicity of the input signal for substantially all predetermined intervals;
  
  a prototype waveform (PW) subband correlation computation module, adapted to provide a PW subband correlation vector, said PW subband correlation vector characterizing a degree of correlation between successive PW vectors as a function of frequency and computed for substantially all predetermined intervals;
  
  an adaptive bandwidth broadening module, adapted to reduce annoying artifacts due to spurious spectral peaks by performing the following;
  
  compute a measure of VAD likelihood based on voice activity detection (VAD) flags for a preceding, a current and a next predetermined interval; and
  
  compute average PW gain values for inactive predetermined intervals and active unvoiced predetermined intervals.
- View Dependent Claims (23, 24, 25, 26)
- - 23. A system as recited in claim 22 wherein said adaptive bandwidth broadening module is further adapted to perform the following:
    - compute a parameter α
      
      _fattto determine the degree of bandwidth broadening necessary for the interpolated LP synthesis filter coefficients using a VAD likelihood measure, PW gain averages and the PW subband correlation quantization index.
  - 24. A system as recited in claim 22 wherein said adaptive bandwidth broadening module is further adapted to attenuate out-of-band components of a reconstructed PW vector by performing the perform the following:
    - compute a first corner frequency for a low frequency based on a pitch frequency;
      
      compute a second corner frequency at a high frequency based on the pitch frequency and α
      
      _fatt; and
      
      determine a rate of attenuation of high frequency components as a square law function, based on α
      
      _fatt.
  - 25. A system as recited in claim 22, wherein said predetermined interval is preferably 20 ms in duration.
  - 26. A system as recited in claim 22, wherein said predetermined interval comprises a frame.

27. A low bit rate coding system for a coder/decoder (codec), comprising:
- a linear prediction (LP) front end, adapted to process an input signal which provides LP parameters that are computed during a predetermined interval;
  
  an open loop pitch estimator, adapted to perform pitch estimation on said input signal for substantially all of said predetermined intervals;
  
  a voice activity detection module, adapted to process and provide the LP parameters and pitch information to the decoder;
  
  a prototype waveform (PW) encoder, adapted to provide a look ahead based on said predetermined interval in order to smooth PW parameters; and
  
  a voicing measure computation module, adapted to provide a voicing measure, said voicing measure characterizing a degree of voicing derived from a plurality of input parameters that are correlated to the degree of periodicity of the input signal for substantially all predetermined intervals.
- View Dependent Claims (28, 29, 30, 31, 32, 33, 34, 35)
- - 28. A system as recited in claim 27 wherein said PW parameters comprise at least one of gain, a voicing measure, subband correlations and spectral magnitude.
  - 29. A system as recited in claim 27 further comprising:
    - a prototype waveform (PW) subband correlation computation module, adapted to provide a PW subband correlation vector, said PW subband correlation vector characterizing a degree of correlation between successive PW vectors as a function of frequency and computed for substantially all predetermined intervals to obtain PW vectors for a current predetermined interval and a look ahead predetermined interval.
  - 30. A system as recited in claim 27 further comprising:
    - A PW gain computation module, adapted to compute a PW gain for substainally all sub-predetermined intervals including a current predetermined interval and a look ahead predetermined interval.
  - 31. A system as recited in claim 27 further comprising:
    - a voicing measure smoothing module, adapted to smooth a voicing measure by combining a voicing measure associated with a current predetermined interval and a look ahead predetermined interval.
  - 32. A system as recited in claim 27 further comprising:
    - a PW gain smoothing module, adapted to provide PW gain smoothing via a parabolic symmetric window for each predetermined interval and a 2;
      
      1 decimation, quantization and transmission to the decoder, said parabolic symmetric window is centered at a edge of the predetermined interval; and
      
      a PW magnitude smoothing module, adapted to represent a PW spectral magnitude at a frame edge via a smoothed PW subband mean approximation.
  - 33. A system as recited in claim 32 further comprising:
    - a PW magnitude quantization module, adapted to quantize and provide a smoothed PW subband mean approximation to the decoder.
  - 34. A system as recited in claim 27 further comprising:
    - an adaptive bandwidth broadening module, adapted to reduce annoying artifacts due to spurious spectral peaks by performing the following;
      
      compute a measure of VAD likelihood based on voice activity detection (VAD) flags for a preceding, a current and a next two predetermined intervals; and
      
      compute average PW gain values for inactive predetermined intervals and active unvoiced predetermined intervals.
  - 35. A system as recited in claim 27, wherein said codec operates at 2.4 kbps.

36. A low bit rate coding system for a coder/decoder (codec), comprising:
- a linear prediction (LP) front end, adapted to process an input signal which provides LP parameters that are estimated, quantized and transmitted for substantially all frames of a first duration;
  
  an open loop pitch estimator, adapted to perform pitch estimation on said input signal for substantially all of said frames of a first duration and quantize and transmit pitch information for substantially all frames of a second duration;
  
  a voice activity detection module, adapted to combine voice activity detection (VAD) flags associated with two successive frames of a first duration based on processing the LP parameters and the pitch information every frame of a first duration and transmitting the VAD flags to the decoder substantially every frame of a second duration; and
  
  a prototype waveform (PW) encoder, adapted to provide a look ahead frame based on said frame of a first duration in order to smooth PW parameters including at least one of PW gain, a voicing measure, subband correlations and spectral magnitude.
- View Dependent Claims (37, 38, 39, 40, 41, 42, 43)
- - 37. A system as recited in claim 36, wherein said codec operates at 1.2 kbps.
  - 38. A system as recited in claim 36, wherein said frames of a first duration comprise 20 ms each, and frames of a second duration comprise 40 ms each.
  - 39. A system as recited in claim 36 further comprising:
    - a voicing measure computation module, adapted to provide a voicing measure, said voicing measure characterizing a degree of voicing derived from a plurality of input parameters that are correlated to the degree of periodicity of the input signal for substantially all the frames of a first duration.
  - 40. A system as recited in claim 36 further comprising:
    - a voicing measure smoothing module, adapted to combine a voicing measure associated with a second half of a current frame of a second duration and a voicing measure associated with a look ahead frame of a first duration based on their respective energies in order to smooth the voicing measures;
      
      a prototype waveform (PW) subband correlation computation module, adapted to provide a PW subband correlation vector, said PW subband correlation vector characterizing a degree of correlation between successive PW vectors as a function of frequency and computed for a current frame of a first duration in order to provide PW vectors for a current frame of a second duration and a look ahead frame of a first duration;
      
      a PW gain computation module, adapted to compute a PW gain for substainally all subframes for both the current frame of a second duration and the look ahead frame of a first duration; and
      
      said prototype waveform (PW) subband correlation computation module being further adapted to quantize and transmit a composite PW subband correlation vector and voicing measure to the decoder;
  - 41. A system as recited in claim 36 further comprising:
    - a PW gain smoothing module, adapted to provide PW gain smoothing via a parabolic symmetricwindow for each instant of time followed by a 4;
      
      1 decimation, quantization and transmission to the decoder for substantially all the frames of a second duration, said parabolic symmetric window is centered at a edge of the frame of a second duration; and
      
      a PW magnitude smoothing module, adapted to represent a PW spectral magnitude at the frame edge of a second duration via a smoothed PW subband mean approximation.
  - 42. A system as recited in claim 36 further comprising:
    - a PW magnitude quantization module, adapted to quantize and provide a smoothed PW subband mean approximation to the decoder.
  - 43. A system as recited in claim 36 further comprising:
    - an adaptive bandwidth broadening module at the decoder, adapted to reduce annoying artifacts due to spurious spectral peaks in inactive noise frames by performing the following;
      
      compute a measure of VAD likelihood based on the VAD flags for a preceding, a current and a next frame of a second duration; and
      
      compute average PW gain values for the inactive noise frames and active unvoiced voice frames.

44. A method for providing adaptive bandwidth broadening to an encoder of a coder/decoder (codec), comprising:
- processing an input signal which provides LP parameters that are computed during a predetermined interval;
  
  performing pitch frequency estimation on said input signal for substantially all of said predetermined intervals;
  
  deriving a spectrum sampling frequency for said predetermined interval as the pitch frequency or its integer submultiple depending on the pitch frequency;
  
  determining a LP power spectrum at the harmonics of said spectrum sampling frequency for said input signal for said frame;
  
  computing a peak to average ratio of said LP spectrum based on said spectrum sampling frequency of said frame; and
  
  adaptively bandwidth broadening said LP filter coefficients based on said peak to average ratio of said LP spectrum for all harmonic multiples of said spectral sampling frequency.

45. A method of providing a coding system for a codec, comprising:
- processing an input signal to provide LP parameters which are quantized and encoded over predetermined intervals and are used to compute a LP residual signal;
  
  processing the LP residual signal, pitch information, pitch interpolation information and providing a pitch contour within the predetermined intervals;
  
  extracting a prototype waveform (PW) for a number of equal subintervals within the predetermined intervals and extracting an additional approximate PW in the subinterval immediately after the ending of a previous subinterval in response to the LP residual signal and the pitch contour;
  
  computing a PW gain for substantially all the subintervals; and
  
  quantizing and encoding the PW gains for substantially all the subintervals after the subintervals are filtered by a weighted window, decimated, and subtracted from a predicted average PW gain value for a current predetermined interval which is computed from the quantized PW gain values of a preceding predetermined interval.

46. A method of providing a coding system for a coder/decoder (codec), comprising:
- computing a sequence of aligned prototype waveform (PW) vectors for a frame via a low complexity alignment process; and
  
  computing a PW correlation vector for all harmonics for the frame and averaging the PW correlation vector across the harmonics in five subbands in order to derive a PW subband correlation vector.

47. A method of providing a coding system for a frequency domain interpolative (FDI) coder/decoder (codec), comprising:
- directly quantizing a prototype waveform (PW) in a magnitude domain for substantially every frame without said PW being decomposed into complex components;
  
  hierarchically quantizing a PW magnitude vector based on a voicing classification using a mean-deviations representation;
  
  adaptively vector quantizing the mean component of the representation in multiple subbands;
  
  deriving a variable dimension deviations vector as the difference of the input PW magnitude vector and the full band representation of the quantized PW subband mean vector for all harmonics;
  
  selecting a fixed dimensional deviations subvector from the said variable dimensional deviations vector based on a location of speech formant frequencies for a subframe; and
  
  providing the said fixed dimensional deviations subvector for adaptive vector quantization.

48. A method of providing a coding system for a coder/decoder (codec), comprising:
- processing an input signal which provides LP parameters that are computed during a predetermined interval;
  
  performing a pitch estimation on said input signal for substantially all of said predetermined intervals;
  
  processing the LP parameters and pitch information;
  
  providing a voicing measure that characterizes a degree of voicing and is derived from a plurality of input parameters that are correlated to the degree of periodicity of the input signal for substantially all predetermined intervals;
  
  providing a PW subband correlation vector, said PW subband correlation vector characterizing a degree of correlation between successive PW vectors as a function of frequency and computed for substantially all predetermined intervals;
  
  reducing annoying artifacts due to spurious spectral peaks by performing the following;
  
  computing a measure of VAD likelihood based on voice activity detection (VAD) flags for a preceding, a current and a next predetermined interval; and
  
  computing average PW gain values for inactive predetermined intervals and active unvoiced predetermined intervals.

49. A method of providing a low bit rate coding system for a coder/decoder (codec), comprising:
- processing an input signal which provides LP parameters that are computed during a predetermined interval;
  
  performing pitch estimation on said input signal for substantially all of said predetermined intervals;
  
  processing the LP parameters and pitch information to the decoder;
  
  providing a look ahead based on said predetermined interval in order to smooth PW parameters; and
  
  providing a voicing measure, said voicing measure characterizing a degree of voicing derived from a plurality of input parameters that are correlated to the degree of periodicity of the input signal for substantially all predetermined intervals.

50. A method of providing a low bit rate coding system for a coder/decoder (codec), comprising:
- processing an input signal which provides LP parameters that are estimated, quantized and transmitted for substantially all frames of a first duration;
  
  performing a pitch estimation on said input signal for substantially all of said frames of a first duration and quantizing and transmiting pitch information for substantially all frames of a second duration;
  
  combining voice activity detection (VAD) flags associated with two successive frames of a first duration;
  
  processing the LP parameters and the pitch information every frame of a first duration and transmitting the VAD flags to the decoder substantially every frame of a second duration; and
  
  providing a look ahead frame based on said frame of a first duration in order to smooth PW parameters including at least one of PW gain, a voicing measure, subband correlations and a spectral magnitude.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hughes Network Systems LLC (Echostar Corporation)
Original Assignee
Hughes Network Systems LLC (Echostar Corporation)
Inventors
Bhaskar, Udaya, Swaminathan, Kumar

Application Number

US10/382,202
Publication Number

US 20040002856A1
Time in Patent Office

Days
Field of Search
US Class Current

704/219
CPC Class Codes

G10L 19/097 using prototype waveform de...

Multi-rate frequency domain interpolative speech CODEC system

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

218 Citations

50 Claims

Specification

Solutions

Use Cases

Quick Links

Multi-rate frequency domain interpolative speech CODEC system

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

218 Citations

50 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links