High quality low bit rate celp-based speech codec
First Claim
1. A low bit rate codec for coding and decoding a speech signal comprising:
- means for receiving the speech signal and dividing the speech signal into speech frames;
linear predictive code analysis means operative on a speech frame for performing linear predictive code analysis on a first and a second linear prediction window the first linear prediction window being centered at the middle of the speech frame and the second linear prediction window being centered at the edge of the speech frame, wherein the linear predictive code analysis means generates a first set of filter coefficients for the first linear prediction window and a second set of filter coefficients for the second linear prediction window;
pitch estimation means for generating a pitch estimate for each of a first and a second pitch estimation window, the first pitch estimation window being centered at the middle of the speech frame and the second pitch estimation window being centered at the edge of the speech frame;
mode classification means responsive to the first and the second sets of filter coefficients and the first and the second pitch estimates, for classifying the speech frame into one of a plurality of modes, wherein a first mode is predominantly voiced and a second mode is not predominantly voiced;
encoding means for encoding the speech frame based on the classified mode of the speech frame, wherein, for a speech frame classified in the first mode, the encoded speech frame encodes information derived from the second set of linear coefficients and the second pitch estimate, and for a speech frame classified in the second mode, the encoded speech frame encodes information derived from the first and the second sets of linear coefficients;
transmitting means for transmitting the encoded speech frame;
receiving means for receiving a transmission for an encoded speech frame and identifying the transmitted speech frame as one of a first mode and a second mode speech frame; and
decoder means for decoding the transmitted speech frame in a mode-specific manner based on the identified mode of the transmitted speech frame.
15 Assignments
0 Petitions
Accused Products
Abstract
Code excited linear prediction (CELP) is performed using two voiced and unvoiced sets of windows, each set is used both for linear prediction and pitch determination. The accompanying degradation in voice quality is comparable to the IS54 standard 8.0 Kbps voice coder employed in U.S. digital cellular systems. This is accomplished by using the same parametric model used in traditional CELP coders but determining, quantizing, encoding, and updating these parameters differently. The low bit rate speech decoder is like most CELP decoders except that it operates in two modes depending on the received mode bit. Both pitch prefiltering and global postfiltering are employed for enhancement of the synthesized speech. In addition, built-in error detection and error recovery schemes are used that help mitigate the effects of any uncorrectable transmission errors.
180 Citations
17 Claims
-
1. A low bit rate codec for coding and decoding a speech signal comprising:
-
means for receiving the speech signal and dividing the speech signal into speech frames; linear predictive code analysis means operative on a speech frame for performing linear predictive code analysis on a first and a second linear prediction window the first linear prediction window being centered at the middle of the speech frame and the second linear prediction window being centered at the edge of the speech frame, wherein the linear predictive code analysis means generates a first set of filter coefficients for the first linear prediction window and a second set of filter coefficients for the second linear prediction window; pitch estimation means for generating a pitch estimate for each of a first and a second pitch estimation window, the first pitch estimation window being centered at the middle of the speech frame and the second pitch estimation window being centered at the edge of the speech frame; mode classification means responsive to the first and the second sets of filter coefficients and the first and the second pitch estimates, for classifying the speech frame into one of a plurality of modes, wherein a first mode is predominantly voiced and a second mode is not predominantly voiced; encoding means for encoding the speech frame based on the classified mode of the speech frame, wherein, for a speech frame classified in the first mode, the encoded speech frame encodes information derived from the second set of linear coefficients and the second pitch estimate, and for a speech frame classified in the second mode, the encoded speech frame encodes information derived from the first and the second sets of linear coefficients; transmitting means for transmitting the encoded speech frame; receiving means for receiving a transmission for an encoded speech frame and identifying the transmitted speech frame as one of a first mode and a second mode speech frame; and decoder means for decoding the transmitted speech frame in a mode-specific manner based on the identified mode of the transmitted speech frame. - View Dependent Claims (2, 3)
-
-
4. A method of encoding and decoding a speech signal comprising the steps of:
-
receiving a speech signal and dividing the speech signal into speech frames; performing linear predictive code analysis on a speech frame in each of a first and a second linear prediction window, the first linear prediction window being centered at the middle of the speech frame and the second linear prediction window being centered at the edge of the speech frame; generating a first set of filter coefficients for the first linear prediction window and a second set of filter coefficients for the second linear prediction window; generating a first pitch estimate for a first pitch estimation window and a second pitch estimate for a second pitch estimation window, the first pitch estimation window being centered at the middle of the speech frame and the second pitch estimation window being centered at the edge of the speech frame; classifying the speech frame into one of a plurality of modes based on the first and the second sets of filter coefficients and the first and the second pitch estimates, wherein a first mode is predominantly voiced and a second mode is not predominantly voiced; encoding the speech frame based on the classified mode of the speech frame, wherein, for a speech frame classified in the first mode, the encoded speech frame encodes information derived from the second set of linear coefficients and the second pitch estimate, and for a speech frame classified in the second mode, the encoded speech frame encodes information derived from the first and the second sets of linear coefficients;
transmitting the encoded speech frame;receiving a transmission for an encoded speech frame and identifying the transmitted speech frame as one of a first mode and a second mode speech frame; and decoding the transmitted speech frame in a mode-specific manner, based on the identified mode of the transmitted speech frame. - View Dependent Claims (5)
-
-
6. A coder for encoding a speech signal comprising:
-
a receiver for receiving the speech signal and dividing the speech signal into speech frames; a linear predictor for performing linear predictive code analysis on a first and a second linear prediction window, the first linear prediction window being centered at the middle of the speech frame and the second linear prediction window being centered at the edge of the speech frame, wherein the linear predictor generates a first set of filter coefficients for the first linear prediction window and a second set of filter coefficients for the second linear prediction window; a pitch estimator for generating a pitch estimate for each of a first and a second pitch estimation window, the first pitch estimation window being centered at the middle of the speech frame and the second pitch estimation window being centered at the edge of the speech frame; a mode classifier responsive to the first and the second sets of filter coefficients and the first and the second pitch estimates, for classifying the speech frame into one of a plurality of modes, wherein a first mode is predominantly voiced and a second mode is not predominantly voiced; and an encoder for encoding the speech frame based on the classified mode of the speech frame. - View Dependent Claims (7, 8, 9, 10, 11)
-
-
12. A method of encoding a speech signal comprising the steps of:
-
receiving a speech signal and dividing the speech signal into speech frames; performing linear predictive code analysis on a speech frame in a first and a second linear prediction window, the first linear prediction window being centered at the middle of the speech frame and the second linear prediction window being centered at the edge of the speech frame; generating a first set of filter coefficients for the first linear prediction window and a second set of filter coefficients for the second linear prediction window; generating a first pitch estimate for a first pitch estimation window and a second pitch estimate for a second pitch estimation window, the first pitch estimation window being centered at the middle of the speech frame and the second pitch estimation window being centered at the edge of the speech frame; classifying the speech frame into one of a plurality of modes based on the first and the second sets of filter coefficients and the first and the second pitch estimates, wherein a first mode is predominantly voiced and a second mode is predominantly not voiced; encoding the speech frame based on the classified mode of the speech frame; and transmitting the encoded speech frame. - View Dependent Claims (13, 14, 15, 16, 17)
-
Specification