Method and apparatus for speech encoding and decoding by sinusoidal analysis and waveform encoding with phase reproducibility

US 7,454,330 B1
Filed: 10/24/1996
Issued: 11/18/2008
Est. Priority Date: 10/26/1995
Status: Expired due to Fees

First Claim

Patent Images

1. A speech encoding method in which an input speech signal is divided on a time axis in terms of pre-set encoding units and encoded in terms of the pre-set encoding units, comprising the steps of:

detecting a voiced/unvoiced sound state of the input speech signal and classifying the input speech signal into voiced portions and unvoiced portions;

finding short-term prediction residuals of the voiced portions of the input speech signal;

encoding the short-term prediction residuals of the voiced portions of the input speech signal by sinusoidal analytic encoding; and

encoding the unvoiced portions of the input speech signal by waveform encoding.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech encoding method and apparatus in which an input speech signal is divided in terms of blocks or frames as encoding units and encoded in terms of the encoding units, whereby explosive and fricative consonants can be impeccably reproduced, while there is an attenuation of the occurrence of foreign sounds being generated at a transient portion between voiced (V) and unvoiced (UV) portions, so that the speech with high clarity devoid of “stuffed” feeling may be produced. The encoding apparatus includes a first encoding unit for finding residuals of linear predictive coding (LPC) of an input speech signal for performing harmonic coding and a second encoding unit for encoding the input speech signal by waveform coding. The first encoding unit and the second encoding unit are used for encoding a voiced (V) portion and an unvoiced (UV) portion of the input signal, respectively. Code excited linear prediction (CELP) encoding employing vector quantization by a closed loop search of an optimum vector using an analysis-by-synthesis method is used for the second encoding unit. A corresponding decoding method and apparatus is also provided.

69 Citations

View as Search Results

28 Claims

1. A speech encoding method in which an input speech signal is divided on a time axis in terms of pre-set encoding units and encoded in terms of the pre-set encoding units, comprising the steps of:
- detecting a voiced/unvoiced sound state of the input speech signal and classifying the input speech signal into voiced portions and unvoiced portions;
  
  finding short-term prediction residuals of the voiced portions of the input speech signal;
  
  encoding the short-term prediction residuals of the voiced portions of the input speech signal by sinusoidal analytic encoding; and
  
  encoding the unvoiced portions of the input speech signal by waveform encoding.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The speech encoding method as claimed in claim 1, wherein harmonic encoding is employed as the sinusoidal analytic encoding.
  - 3. The speech encoding method as claimed in claim 1, wherein a voiced/unvoiced sound state of each of a plurality of portions of the input speech signal is detected for classifying each of the plurality of portions of the input speech signal into one of a voiced mode and an unvoiced mode, and wherein the portions of the input speech signal classified to be in the voiced mode are encoded by said sinusoidal analytic encoding while the portions of the input speech signal classified to be in the unvoiced mode are processed with said waveform encoding, said waveform encoding including vector quantization of the time-domain waveform by a closed loop search for the optimum vector using an analysis by synthesis method.
  - 4. The speech encoding method as claimed in claim 1, wherein one of a perceptually weighted vector quantization process and matrix quantization process is used for quantization of the sinusoidal analysis encoding parameters of the short-term prediction residuals.
  - 5. The speech encoding method as claimed in claim 4, wherein weights are calculated at the time of performing one of said perceptually weighted matrix quantization process and vector quantization process based on the results of orthogonal transform of parameters derived from an impulse response of a weight transfer function.

6. A speech encoding apparatus in which an input speech signal is divided on a time axis in terms of pre-set encoding units and encoded in terms of the pre-set encoding units, comprising:
- means for detecting a voiced/unvoiced sound state of the input speech signal and classifying the input speech signal into voiced portions and unvoiced portions;
  
  means for finding short-term prediction residuals of voiced portions of the input speech signal;
  
  means for encoding the short-term prediction residuals of voiced portions of the input speech signal by sinusoidal analytic encoding; and
  
  means for encoding unvoiced portions of the input speech signal by waveform encoding.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The speech encoding apparatus as claimed in claim 6, wherein harmonic encoding is employed as the sinusoidal analytic encoding.
  - 8. The speech encoding apparatus as claimed in claim 6, further comprising:
    - means for discriminating if the input speech signal is voiced speech or unvoiced speech and for generating a voiced/unvoiced mode signal; and
      
      switch means responsive to the voice/unvoiced mode signal for outputting an encoded signal provided by the means for encoding the short-term prediction residuals when the voiced/unvoiced mode signal indicates that the input speech is voiced speech and for outputting an encoded signal produced by the means for encoding the input speech signal by waveform encoding when the voiced/unvoiced mode signal indicates that the input speech is unvoiced speech;
      
      wherein said waveform encoding means performs code excited linear predictive coding doing vector quantization by closed loop search of an optimum vector using an analysis by synthesis method.
  - 9. The speech encoding apparatus as claimed in claim 6, wherein said sinusoidal analytic encoding means uses one of a perceptually weighted vector quantization process and matrix quantization process for quantizing the sinusoidal analytic encoding parameters of said short-term prediction residuals.
  - 10. The speech encoding apparatus as claimed in claim 6, wherein said sinusoidal analytic encoding means calculates a weight at the time of performance of one of said perceptually weighted matrix quantization process and vector quantization process on the basis of the results of orthogonal transform of parameters derived from an impulse response of a weight transfer function.

11. A speech decoding method for decoding an encoded speech signal obtained by encoding a voiced portion of an input speech signal with first encoding comprising sinusoidal analytic encoding and by encoding an unvoiced portion of the input speech signal with second encoding employing short-term prediction residuals, comprising the steps of:
- finding first short-term prediction residuals for the voiced speech portion of the encoded speech signal by sinusoidal synthesis;
  
  finding second short-term prediction residuals for the unvoiced speech portion of the encoded speech signal; and
  
  employing predictive synthetic filtering for synthesizing first and second time-axis waveforms based on the first and second short-term prediction residuals of the voiced and unvoiced speech portions, respectively.
- View Dependent Claims (12, 13, 14)
- - 12. The speech decoding method as claimed in claim 11, further comprising a first post-filtering step of post-filtering the first time-axis waveform of the voiced portion, and a second post-filtering step of post-filtering the second time-axis waveform of the unvoiced portion.
  - 13. The speech decoding method as claimed in claim 12, further comprising the step of combining the first and second post-filtered time-axis waveforms of the voiced and unvoiced portions, respectively, to synthesize a third time-axis waveform.
  - 14. The speech decoding method as claimed in claim 11, wherein one of a perceptually weighted vector quantization process and matrix quantization process is used for quantizing a sinusoidal synthetic parameter of said short-term prediction residuals.

15. A speech decoding apparatus for decoding an encoded speech signal obtained by encoding voiced portions of an input speech signal with a first encoding and by encoding unvoiced portions of the input speech signal with a second encoding, comprising:
- means for finding short-term prediction residuals for the voiced portions of the input speech signal by sinusoidal analytic encoding;
  
  means for finding short-term prediction residuals for the unvoiced portions of said encoded speech signal; and
  
  predictive synthetic filtering means for synthesizing a first time-axis waveform based on said short-term prediction residuals of the voiced speech portions and for synthesizing a second time-axis waveform based on the short-term prediction residuals of the unvoiced speech portions.
- View Dependent Claims (16)
- - 16. The speech decoding apparatus as claimed in claim 15, wherein said predictive synthetic filtering means further comprises:
    - first predictive filtering means for synthesizing said first time-axis waveform of the voiced portion based on the short-term prediction residuals of the voiced speech portion, andsecond predictive filtering means for synthesizing said second time-axis waveform of the unvoiced portion based on the short-term prediction residuals of the unvoiced speech portion.

17. A speech decoding method for decoding an encoded speech signal obtained by finding short-term prediction residuals of an input speech signal and encoding resulting short-term prediction residuals with sinusoidal analytic encoding, comprising the steps of:
- finding said short-term prediction residuals of said encoded speech signal by sinusoidal synthesis;
  
  adding noise controlled in amplitude based on said encoded speech signal to said short-term prediction residuals found by said sinusoidal synthesis; and
  
  performing predictive synthetic filtering by synthesizing a time-domain waveform based on said short-term prediction residuals found by said sinusoidal synthesis added to said noise.
- View Dependent Claims (18, 19, 20)
- - 18. The speech decoding method as claimed in claim 17, wherein said step of adding said noise adds said noise controlled on a basis of pitch and spectral envelope obtained from said encoded speech signal.
  - 19. The speech decoding method as claimed in claim 17, wherein said noise added in said step of adding has an upper value which is limited to a pre-set value.
  - 20. The speech decoding method as claimed in claim 17, wherein said sinusoidal analytic encoding is performed on short-term prediction residuals of a voiced portion of said input speech signal and wherein vector quantization of said time-domain waveform by a closed-loop search of an optimum vector is performed on an unvoiced portion of said input speech signal by an analysis by synthesis method.

21. A speech decoding apparatus for decoding an encoded speech signal obtained by finding short-term prediction residuals of an input speech signal and encoding said resulting short-term prediction residuals with sinusoidal analytic encoding, comprising:
- sinusoidal synthesis means for finding said short-term prediction residuals of said encoded speech signal by sinusoidal synthesis;
  
  noise addition means for adding noise controlled in amplitude based on said encoded speech signal to said short-term prediction residuals; and
  
  predictive synthetic filtering means for synthesizing a time-domain waveform based on said short-term prediction residuals found by said sinusoidal synthesis means added to said noise.
- View Dependent Claims (22, 23, 24)
- - 22. The speech decoding apparatus as claimed in claim 21, wherein said noise addition means adds said noise controlled on a basis of pitch and spectral envelope obtained from said encoded speech signal.
  - 23. The speech decoding apparatus as claimed in claim 21, wherein said noise added by said noise addition means has an upper value which is limited to a pre-set value.
  - 24. The speech decoding apparatus as claimed in claim 21, wherein said sinusoidal analytic encoding is performed on short-term prediction residuals of a voiced portion of said input speech signal and wherein vector quantization of said time-domain waveform by a closed-loop search of an optimum vector is performed on an unvoiced portion of said input speech signal by an analysis by synthesis method.

25. A method for encoding an audible signal, comprising the steps of:
- converting parameters derived from the input audible signal into a frequency-domain signal; and
  
  performing weighted vector quantization of said parameters, the weight of said weighted vector quantization being calculated based on results of an orthogonal transform of parameters derived from an impulse response of a weight transfer function.
- View Dependent Claims (26)
- - 26. The method for encoding an audible signal as claimed in claim 25, wherein said orthogonal transform is a fast Fourier transform, wherein a real part of a coefficient resulting from the fast Fourier transform is expressed as re, an imaginary part of the coefficient resulting from the fast Fourier transform is expressed as im, and wherein one of the group consisting of (re, im) itself, re²+im², and (re²+im²)^1/2, as interpolated, is used as said weight.

27. A portable radio terminal apparatus comprising:
- amplifier means for amplifying an input speech signal;
  
  A/D conversion means for performing analog to digital conversion of an output signal from said amplifier means;
  
  speech encoding means for speech-encoding an output signal from said A/D conversion means;
  
  transmission path encoding means for channel coding an output signal from said speech encoding means;
  
  modulation means for modulating an output signal from said transmission path encoding means;
  
  D/A conversion means for performing digital to analog conversion of an output signal from said modulation means; and
  
  amplifier means for amplifying an output signal from said D/A conversion means and supplying the resulting amplified signal to an antenna;
  
  wherein said speech encoding means comprises;
  
  means for detecting a voiced/unvoiced sound state of the input speech signal and classifying the input speech signal into voiced portions and unvoiced portions;
  
  predictive encoding means for finding short-term prediction residuals of voiced portions of the input speech signal;
  
  sinusoidal analytic encoding means for encoding the short-term prediction residuals of voiced portions of the input speech signal by sinusoidal analytic encoding; and
  
  waveform encoding means for waveform encoding of unvoiced portions of the input speech signal.

28. A portable radio terminal apparatus comprising:
- amplifier means for amplifying a received signal;
  
  A/D conversion means for performing analog to digital conversion of an output signal from said amplifier means;
  
  demodulating means for demodulating an output signal from said A/D conversion means;
  
  transmission path decoding means for channel decoding an output signal from said demodulating means;
  
  speech decoding means for speech-decoding an output signal from said transmission path decoding means; and
  
  D/A conversion means for performing digital to analog conversion of an output signal from said demodulating means;
  
  wherein said speech decoding means comprises;
  
  sinusoidal synthesis means for finding short-term prediction residuals of said encoded speech signal by sinusoidal synthesis;
  
  noise addition means for adding noise controlled in amplitude based on said encoded speech signal to said short-term prediction residuals; and
  
  a predictive synthetic filter for synthesizing a time-domain waveform based on the short-term prediction residuals added to the noise.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Sony Corporation (Sony Group Corp.)
Original Assignee
Sony Corporation (Sony Group Corp.)
Inventors
Matsumoto, Jun, Omori, Shiro, Nishiguchi, Masayuki, Iijima, Kazuyuki
Primary Examiner(s)
Hudspeth; David R.
Assistant Examiner(s)
Opsasnick; Michael N.

Application Number

US08/736,546
Time in Patent Office

4,408 Days
Field of Search

704219-229
US Class Current

704/224
CPC Class Codes

G10L 19/02   using spectral analysis, e....

G10L 19/0212   using orthogonal transforma...

G10L 19/04   using predictive techniques

G10L 19/06   Determination or coding of ...

G10L 19/12   the excitation function bei...

G10L 25/27   characterised by the analys...

G10L 25/93   Discriminating between voic...

Method and apparatus for speech encoding and decoding by sinusoidal analysis and waveform encoding with phase reproducibility

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

69 Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for speech encoding and decoding by sinusoidal analysis and waveform encoding with phase reproducibility

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

69 Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links