Variable rate speech coding

US 7,496,505 B2
Filed: 11/13/2006
Issued: 02/24/2009
Est. Priority Date: 12/21/1998
Status: Expired due to Fees

First Claim

Patent Images

1. A method of encoding a speech signal comprising:

(a) classifying the speech signal as either active or inactive speech;

(b) classifying said active speech into one of a plurality of types of active speech;

(c) selecting an encoder mode from a plurality of encoder modes based on whether the speech signal is active or inactive, and if active, based further on said type of active speech, wherein said plurality of encoder modes comprises a code excited linear prediction (CELP) encoder mode, a prototype pitch period (PPP) encoder mode, and a noise excited linear prediction (NELP) encoder mode; and

(d) encoding the speech signal according to said selected encoder mode to form an encoded speech signal.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for the variable rate coding of a speech signal. An input speech signal is classified and an appropriate coding mode is selected based on this classification. For each classification, the coding mode that achieves the lowest bit rate with an acceptable quality of speech reproduction is selected. Low average bit rates are achieved by only employing high fidelity modes (i.e., high bit rate, broadly applicable to different types of speech) during portions of the speech where this fidelity is required for acceptable output. Lower bit rate modes are used during portions of speech where these modes produce acceptable output. Input speech signal is classified into active and inactive regions. Active regions are further classified into voiced, unvoiced, and transient regions. Various coding modes are applied to active speech, depending upon the required level of fidelity. Coding modes may be utilized according to the strengths and weaknesses of each particular mode. The apparatus dynamically switches between these modes as the properties of the speech signal vary with time. And where appropriate, regions of speech are modeled as pseudo-random noise, resulting in a significantly lower bit rate. This coding is used in a dynamic fashion whenever unvoiced speech or background noise is detected.

99 Citations

View as Search Results

32 Claims

1. A method of encoding a speech signal comprising:
- (a) classifying the speech signal as either active or inactive speech;
  
  (b) classifying said active speech into one of a plurality of types of active speech;
  
  (c) selecting an encoder mode from a plurality of encoder modes based on whether the speech signal is active or inactive, and if active, based further on said type of active speech, wherein said plurality of encoder modes comprises a code excited linear prediction (CELP) encoder mode, a prototype pitch period (PPP) encoder mode, and a noise excited linear prediction (NELP) encoder mode; and
  
  (d) encoding the speech signal according to said selected encoder mode to form an encoded speech signal.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
- - 2. The method of claim 1, further comprising decoding said encoded speech signal according to said selected encoder mode, forming a synthesized speech signal.
  - 3. The method of claim 1, wherein each encoder mode has a predetermined bit rate.
  - 4. The method of claim 3, wherein said CELP encoder mode is associated with a bit rate of about 8500 bits per second, said PPP encoder mode is associated with a bit rate of about 3900 bits per second, and said NELP encoder mode is associated with a bit rate of about 1550 bits per second.
  - 5. The method of claim 3, wherein said plurality of encoder modes further comprises a zero rate mode.
  - 6. The method of claim 1, wherein said plurality of types of active speech comprises voiced, unvoiced, and transient active speech.
  - 7. The method of claim 6, wherein selecting the encoder mode comprises:
    - (a) selecting a CELP encoder mode if said speech is classified as active transient speech;
      
      (b) selecting a PPP encoder mode if said speech is classified as active voiced speech; and
      
      (c) selecting a NELP encoder mode if said speech is classified as inactive speech or active unvoiced speech.
  - 8. The method of claim 7, wherein said encoded speech signal comprises:
    - codebook parameters and pitch filter parameters if said CELP encoder mode is selected;
      
      codebook parameters and rotational parameters if said PPP encoder mode is selected;
      
      orcodebook parameters if said NELP encoder mode is selected.
  - 9. The method of claim 1, further comprising calculating initial parameters using a look ahead function.
  - 10. The method of claim 9, wherein said initial parameters comprise linear predictive coding (LPC) coefficients.
  - 11. The method of claim 1, wherein said plurality of encoder modes comprises a NELP encoder mode, wherein the speech signal is represented by a residual signal generated by filtering the speech signal with a Linear Predictive Coding (LPC) analysis filter, and wherein said encoding comprises:
    - (i) estimating the energy of the residual signal, and(ii) selecting a codevector from a first codebook, wherein said codevector approximates said estimated energy; and
      
      wherein decoding comprises;
      
      (i) generating a random vector,(ii) retrieving said codevector from a second codebook,(iii) scaling said random vector based on said codevector, such that the energy of said scaled random vector approximates said estimated energy, and(iv) filtering said scaled random vector with a LPC synthesis filter, wherein said filtered scaled random vector forms said synthesized speech signal.
  - 12. The method of claim 11, wherein the speech signal is divided into frames, wherein each of said frames comprises two or more subframes, wherein estimating the energy comprises estimating the energy of the residual signal for each of said subframes, and wherein said codevector comprises a value approximating said estimated energy for each of said subframes.
  - 13. The method of claim 11, wherein said first codebook and said second codebook are stochastic codebooks.
  - 14. The method of claim 11, wherein said first codebook and said second codebook are trained codebooks.
  - 15. The method of claim 11, wherein said random vector comprises a unit variance random vector.
  - 16. The method of claim 1, further comprising dynamic switching between modes from one frame to another frame.

17. An apparatus comprising:
- classification means for classifying a speech signal as active or inactive speech, and if active speech, for classifying the active speech as one of a plurality of types of active speech; and
  
  a plurality of encoding means for encoding the speech signal as an encoded speech signal, wherein said encoding means are dynamically selected to encode the speech signal based on whether the speech signal is active or inactive, and if active, based further on said type of active speech, wherein said plurality of encoder means comprises a code excited linear prediction (CELP) encoding means, a prototype pitch period (PPP) encoding means, and a noise excited linear prediction (NELP) encoding means.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
- - 18. The apparatus of claim 17, further comprising a plurality of decoding means for decoding said encoded speech signal.
  - 19. The apparatus of claim 18, wherein said plurality of decoding means includes a CELP decoding means, a PPP decoding means, and a NELP decoding means.
  - 20. The apparatus of claim 17, wherein each of said encoding means encodes at a predetermined bit rate.
  - 21. The apparatus of claim 20, wherein said CELP encoding means encodes at a rate of about 8500 bits per second, said PPP encoding means encodes at a rate of about 3900 bits per second, and said NELP encoding means encodes at a rate of about 1550 bits per second.
  - 22. The apparatus of claim 18, wherein said plurality of encoding means further includes a zero rate encoding means, and wherein said plurality of decoding means further includes a zero rate decoding means.
  - 23. The apparatus of claim 17, wherein said plurality of types of active speech include voiced, unvoiced, and transient active speech.
  - 24. The system of claim 23, wherein said CELP encoding means is selected if said speech is classified as active transient speech, wherein said PPP encoding means is selected if said speech is classified as active voiced speech, and wherein said NELP encoding means is selected if said speech is classified as inactive speech or active unvoiced speech.
  - 25. The apparatus of claim 17, wherein said encoded speech signal comprises codebook parameters and pitch filter parameters if said CELP encoding means is selected, codebook parameters and rotational parameters if said PPP encoding means is selected, or codebook parameters if said NELP encoding means is selected.
  - 26. The apparatus of claim 17, wherein the speech signal is represented by a residual signal generated by filtering the speech signal with a Linear Predictive Coding (LPC) analysis filter, and wherein said plurality of encoding means includes a NELP encoding means comprising:
    - energy estimator means for calculating an estimate of the energy of the residual signal, andencoding codebook means for selecting a codevector from a first codebook, wherein said codevector approximates said estimated energy;
      
      and wherein said plurality of decoding means includes a NELP decoding means comprising;
      
      random number generator means for generating a random vector,decoding codebook means for retrieving said codevector from a second codebook,multiply means for scaling said random vector based on said codevector, such that the energy of said scaled random vector approximates said estimate, andmeans for filtering said scaled random vector with an LPC synthesis filter, wherein said filtered scaled random vector forms said synthesized speech signal.
  - 27. The apparatus of claim 26, wherein the speech signal is divided into frames, wherein each of said frames comprises two or more subframes, wherein said energy estimator means calculates an estimate of the energy of the residual signal for each of said subframes, and wherein said codevector comprises a value approximating said subframe estimate for each of said subframes.
  - 28. The apparatus of claim 26, wherein said first codebook and said second codebook are stochastic codebooks.
  - 29. The apparatus of claim 26, wherein said first codebook and said second codebook are trained codebooks.
  - 30. The apparatus of claim 26, wherein said random vector comprises a unit variance random vector.
  - 31. The apparatus of claim 17, further comprising means for dynamic switching between modes from one frame to another frame.

32. An apparatus comprising:
- a classification module configured to classify a speech signal as active or inactive speech, and if active speech, to classify the active speech as one of a plurality of types of active speech; and
  
  a plurality of encoders configured to encode the speech signal as an encoded speech signal, wherein said encoders are dynamically selected to encode the speech signal based on whether the speech signal is active or inactive, and if active, based further on said type of active speech, wherein said plurality of encoders comprises a code excited linear prediction (CELP) encoding means, a prototype pitch period (PPP) encoding means, and a noise excited linear prediction (NELP) encoding means.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Qualcomm, Inc.
Original Assignee
Qualcomm, Inc.
Inventors
Gardner, William, Manjunath, Sharath
Primary Examiner(s)
Hudspeth; David R.
Assistant Examiner(s)
Serrou; Abdelali

Application Number

US11/559,274
Publication Number

US 20070179783A1
Time in Patent Office

834 Days
Field of Search

704/214, 704/223, 704/219, 704/221, 704/208, 704/258
US Class Current

704/221
CPC Class Codes

G10L 19/20   using sound class specific ...

G10L 19/24   Variable rate codecs, e.g. ...

G10L 2025/783   based on threshold decision

G10L 2025/935   Mixed voiced class; Transit...

Variable rate speech coding

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

99 Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

Variable rate speech coding

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

99 Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links