High quality low bit rate celp-based speech codec

US 5,495,555 A
Filed: 06/25/1992
Issued: 02/27/1996
Est. Priority Date: 06/01/1992
Status: Expired due to Term

First Claim

Patent Images

1. A low bit rate codec for coding and decoding a speech signal comprising:

means for receiving the speech signal and dividing the speech signal into speech frames;

linear predictive code analysis means operative on a speech frame for performing linear predictive code analysis on a first and a second linear prediction window the first linear prediction window being centered at the middle of the speech frame and the second linear prediction window being centered at the edge of the speech frame, wherein the linear predictive code analysis means generates a first set of filter coefficients for the first linear prediction window and a second set of filter coefficients for the second linear prediction window;

pitch estimation means for generating a pitch estimate for each of a first and a second pitch estimation window, the first pitch estimation window being centered at the middle of the speech frame and the second pitch estimation window being centered at the edge of the speech frame;

mode classification means responsive to the first and the second sets of filter coefficients and the first and the second pitch estimates, for classifying the speech frame into one of a plurality of modes, wherein a first mode is predominantly voiced and a second mode is not predominantly voiced;

encoding means for encoding the speech frame based on the classified mode of the speech frame, wherein, for a speech frame classified in the first mode, the encoded speech frame encodes information derived from the second set of linear coefficients and the second pitch estimate, and for a speech frame classified in the second mode, the encoded speech frame encodes information derived from the first and the second sets of linear coefficients;

transmitting means for transmitting the encoded speech frame;

receiving means for receiving a transmission for an encoded speech frame and identifying the transmitted speech frame as one of a first mode and a second mode speech frame; and

decoder means for decoding the transmitted speech frame in a mode-specific manner based on the identified mode of the transmitted speech frame.

View all claims

15 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Code excited linear prediction (CELP) is performed using two voiced and unvoiced sets of windows, each set is used both for linear prediction and pitch determination. The accompanying degradation in voice quality is comparable to the IS54 standard 8.0 Kbps voice coder employed in U.S. digital cellular systems. This is accomplished by using the same parametric model used in traditional CELP coders but determining, quantizing, encoding, and updating these parameters differently. The low bit rate speech decoder is like most CELP decoders except that it operates in two modes depending on the received mode bit. Both pitch prefiltering and global postfiltering are employed for enhancement of the synthesized speech. In addition, built-in error detection and error recovery schemes are used that help mitigate the effects of any uncorrectable transmission errors.

180 Citations

17 Claims

1. A low bit rate codec for coding and decoding a speech signal comprising:
- means for receiving the speech signal and dividing the speech signal into speech frames;
  
  linear predictive code analysis means operative on a speech frame for performing linear predictive code analysis on a first and a second linear prediction window the first linear prediction window being centered at the middle of the speech frame and the second linear prediction window being centered at the edge of the speech frame, wherein the linear predictive code analysis means generates a first set of filter coefficients for the first linear prediction window and a second set of filter coefficients for the second linear prediction window;
  
  pitch estimation means for generating a pitch estimate for each of a first and a second pitch estimation window, the first pitch estimation window being centered at the middle of the speech frame and the second pitch estimation window being centered at the edge of the speech frame;
  
  mode classification means responsive to the first and the second sets of filter coefficients and the first and the second pitch estimates, for classifying the speech frame into one of a plurality of modes, wherein a first mode is predominantly voiced and a second mode is not predominantly voiced;
  
  encoding means for encoding the speech frame based on the classified mode of the speech frame, wherein, for a speech frame classified in the first mode, the encoded speech frame encodes information derived from the second set of linear coefficients and the second pitch estimate, and for a speech frame classified in the second mode, the encoded speech frame encodes information derived from the first and the second sets of linear coefficients;
  
  transmitting means for transmitting the encoded speech frame;
  
  receiving means for receiving a transmission for an encoded speech frame and identifying the transmitted speech frame as one of a first mode and a second mode speech frame; and
  
  decoder means for decoding the transmitted speech frame in a mode-specific manner based on the identified mode of the transmitted speech frame.
- View Dependent Claims (2, 3)
- - 2. The low bit rate codec recited in claim 1 wherein said pitch estimation means comprises:
    - error computing means receiving data for computing an error function for each of the first and the second pitch estimation windows;
      
      refining means responsive to the computed error functions for refining past pitch estimates;
      
      pitch tracking means responsive to said refined past pitch estimates for producing a set of pitch candidates for each of the first and the second pitch estimation windows;
      
      a pitch selector for selecting and outputting a pitch estimate from the set of pitch candidates for each of the first and the second pitch estimation windows.
  - 3. The low bit rate codec recited in claim 2 wherein said mode classification means comprises:
    - an interpolater for generating an interpolated set of filter coefficients for the first linear prediction window based on the second set of filter coefficients;
      
      a cepstral distortion tester for comparing a cepstral distortion measure between the first set of filter coefficients and the interpolated set of filter coefficients against a threshold value;
      
      a first pitch deviation tester for comparing a refined pitch estimate for the second pitch estimation window and the first pitch estimate;
      
      a second pitch deviation tester for comparing the second pitch estimate and the first pitch estimate; and
      
      mode selection means for selecting one of the first mode and the second mode for classifying the speech frame based on the comparisons by the cepstral distortion tester and the first and second pitch deviation testers.

4. A method of encoding and decoding a speech signal comprising the steps of:
- receiving a speech signal and dividing the speech signal into speech frames;
  
  performing linear predictive code analysis on a speech frame in each of a first and a second linear prediction window, the first linear prediction window being centered at the middle of the speech frame and the second linear prediction window being centered at the edge of the speech frame;
  
  generating a first set of filter coefficients for the first linear prediction window and a second set of filter coefficients for the second linear prediction window;
  
  generating a first pitch estimate for a first pitch estimation window and a second pitch estimate for a second pitch estimation window, the first pitch estimation window being centered at the middle of the speech frame and the second pitch estimation window being centered at the edge of the speech frame;
  
  classifying the speech frame into one of a plurality of modes based on the first and the second sets of filter coefficients and the first and the second pitch estimates, wherein a first mode is predominantly voiced and a second mode is not predominantly voiced;
  
  encoding the speech frame based on the classified mode of the speech frame, wherein, for a speech frame classified in the first mode, the encoded speech frame encodes information derived from the second set of linear coefficients and the second pitch estimate, and for a speech frame classified in the second mode, the encoded speech frame encodes information derived from the first and the second sets of linear coefficients;
  
  transmitting the encoded speech frame;
  
  receiving a transmission for an encoded speech frame and identifying the transmitted speech frame as one of a first mode and a second mode speech frame; and
  
  decoding the transmitted speech frame in a mode-specific manner, based on the identified mode of the transmitted speech frame.
- View Dependent Claims (5)
- - 5. The method of claim 4, further including the steps of:
    - synthesizing a speech signal from the decoded speech frame; and
      
      post filtering the synthesized speech signal.

6. A coder for encoding a speech signal comprising:
- a receiver for receiving the speech signal and dividing the speech signal into speech frames;
  
  a linear predictor for performing linear predictive code analysis on a first and a second linear prediction window, the first linear prediction window being centered at the middle of the speech frame and the second linear prediction window being centered at the edge of the speech frame, wherein the linear predictor generates a first set of filter coefficients for the first linear prediction window and a second set of filter coefficients for the second linear prediction window;
  
  a pitch estimator for generating a pitch estimate for each of a first and a second pitch estimation window, the first pitch estimation window being centered at the middle of the speech frame and the second pitch estimation window being centered at the edge of the speech frame;
  
  a mode classifier responsive to the first and the second sets of filter coefficients and the first and the second pitch estimates, for classifying the speech frame into one of a plurality of modes, wherein a first mode is predominantly voiced and a second mode is not predominantly voiced; and
  
  an encoder for encoding the speech frame based on the classified mode of the speech frame.
- View Dependent Claims (7, 8, 9, 10, 11)
- - 7. The coder recited in claim 6 wherein the pitch estimator comprises:
    - an error calculator for receiving data for calculating an error function for the first and the second pitch estimation windows;
      
      a refiner responsive to the calculated error functions for refining past pitch estimates;
      
      a pitch tracker responsive to the refined past pitch estimates for producing a set of pitch candidates for each of the first and the second pitch estimation windows;
      
      a pitch selector for selecting and outputting a pitch estimate from the set of pitch candidates for each of the first and the second pitch estimation windows.
  - 8. The coder recited in claim 6 wherein the mode classifier comprises:
    - an interpolater for generating an interpolated set of filter coefficients for the first linear prediction window based on the second set of filter coefficients;
      
      a cepstral distortion tester for comparing a cepstral distortion measure between the first set of filter coefficients and the interpolated set of filter coefficients against a threshold value;
      
      a first pitch deviation tester for comparing a refined pitch estimate for the second pitch estimation window and the first pitch estimate;
      
      a second pitch deviation tester for comparing the second pitch estimate and the first pitch estimate; and
      
      a mode selector for selecting one of the first mode and the second mode for classifying the speech frame, based on the comparisons by the cepstral distortion tester and the first and second pitch deviation testers.
  - 9. The coder recited in claim 6, wherein each speech frame is partitioned into subframes, and the coder further comprises a closed loop pitch estimator for estimating a pitch for each subframe of a speech frame classified in the first mode based on the second pitch estimate for the speech frame.
  - 10. The coder recited in claim 6, wherein the speech frame is partitioned into subframes, and the coder further comprises a delayed decision excitation modeler for modeling the excitation of each subframe with a set of excitation parameters by:
    - estimating M pitch estimates for each subframe;
      
      determining a set of MN excitation parameter candidates for each excitation parameter for each of the M pitch estimates based on N previously coded speech subframes; and
      
      selecting L excitation parameter estimates from each set of MN excitation parameter candidates;
      
      wherein M, N and L are positive integers variable with each subframe.
  - 11. The coder recited in claim 10, further comprising a glottal pulse fixed codebook and a multi-innovation fixed codebook, wherein for a speech frame classified in the first mode, one of the set of excitation parameters is an index into the glottal pulse fixed codebook, and for a speech frame classified in the second mode, one of the set of excitation parameters is an index into the multi-innovation fixed codebook.

12. A method of encoding a speech signal comprising the steps of:
- receiving a speech signal and dividing the speech signal into speech frames;
  
  performing linear predictive code analysis on a speech frame in a first and a second linear prediction window, the first linear prediction window being centered at the middle of the speech frame and the second linear prediction window being centered at the edge of the speech frame;
  
  generating a first set of filter coefficients for the first linear prediction window and a second set of filter coefficients for the second linear prediction window;
  
  generating a first pitch estimate for a first pitch estimation window and a second pitch estimate for a second pitch estimation window, the first pitch estimation window being centered at the middle of the speech frame and the second pitch estimation window being centered at the edge of the speech frame;
  
  classifying the speech frame into one of a plurality of modes based on the first and the second sets of filter coefficients and the first and the second pitch estimates, wherein a first mode is predominantly voiced and a second mode is predominantly not voiced;
  
  encoding the speech frame based on the classified mode of the speech frame; and
  
  transmitting the encoded speech frame.
- View Dependent Claims (13, 14, 15, 16, 17)
- - 13. The encoding method recited in claim 12 wherein the pitch estimate generation step further comprises:
    - receiving data for calculating an error function for the first and the second pitch estimation windows;
      
      refining past pitch estimates responsive to the calculated error functions;
      
      producing a set of pitch candidates for each of the first and the second pitch estimation windows responsive to the refined past pitch estimates;
      
      selecting and outputting a pitch estimate from the set of pitch candidates for each of the first and the second pitch estimation windows.
  - 14. The encoding method recited in claim 12 wherein the mode classification step further comprises:
    - generating an interpolated set of filter coefficients for the first linear prediction window based on the second set of filter coefficients;
      
      comparing a cepstral distortion measure between the first set of filter coefficients and the interpolated set of filter coefficients against a threshold value;
      
      comparing a first pitch deviation between the refined pitch estimate for the second pitch estimation window and the first pitch estimate;
      
      comparing a second pitch deviation between the second pitch estimate and the first pitch estimate; and
      
      selecting one of the first mode and the second mode for classifying the speech frame, based on the comparisons of the cepstral distortion tester, and the first and second pitch deviations.
  - 15. The encoding method recited in claim 12, further comprising the steps of:
    - partitioning each speech frame into subframes; and
      
      estimating a pitch through a closed loop pitch estimation for each subframe of a speech frame classified in the first mode based on the second pitch estimate for the speech frame.
  - 16. The encoding method recited in claim 12, further comprising the steps of:
    - partitioning the speech frame into subframes; and
      
      modeling the excitation of each subframe with a set of excitation parameters by;
      
      estimating M pitch estimates for each subframe;
      
      determining a set of MN excitation parameter candidates for each excitation parameter for each of the M pitch estimates based on N previously coded speech subframes; and
      
      selecting L excitation parameter estimates from each set of MN excitation parameter candidates;
      
      wherein M, N and L are positive integers variable with each subframe.
  - 17. The encoding method recited in claim 16, further comprising the step of providing a glottal pulse fixed codebook and a multi-innovation fixed codebook, wherein for a speech frame classified in the first mode, one of the set of excitation parameters is an index into the glottal pulse fixed codebook, and for a speech frame classified in the second mode, one of the set of excitation parameters is an index into the multi-innovation fixed codebook.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Hughes Network Systems LLC (Echostar Corporation)
Original Assignee
Hughes Aircraft Company (Rtx Corporation)
Inventors
Swaminathan, Kumar
Primary Examiner(s)
Knepper, David D.

Application Number

US07/905,992
Time in Patent Office

1,342 Days
Field of Search

395/2, 395/2.16, 395/2.17, 395/2.2, 395/2.23, 395/2.28, 395/2.38, 395/2.39, 395/2.3-2.32, 381/29, 381/30, 381/36, 381/38, 381/49
US Class Current

704/207
CPC Class Codes

G10L 19/12   the excitation function bei...

G10L 19/26   Pre-filtering or post-filte...

G10L 2019/0002   Codebook adaptations

G10L 2019/0003   Backward prediction of gain

G10L 25/90   Pitch determination of spee...

G10L 25/93   Discriminating between voic...

High quality low bit rate celp-based speech codec

First Claim

15 Assignments

0 Petitions

Accused Products

Abstract

180 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

High quality low bit rate celp-based speech codec

First Claim

15 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

180 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links