Variable rate speech coding
First Claim
1. A method of encoding a speech signal comprising:
- (a) classifying the speech signal as either active or inactive speech;
(b) classifying said active speech into one of a plurality of types of active speech;
(c) selecting an encoder mode from a plurality of encoder modes based on whether the speech signal is active or inactive, and if active, based further on said type of active speech, wherein said plurality of encoder modes comprises a code excited linear prediction (CELP) encoder mode, a prototype pitch period (PPP) encoder mode, and a noise excited linear prediction (NELP) encoder mode; and
(d) encoding the speech signal according to said selected encoder mode to form an encoded speech signal.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and apparatus for the variable rate coding of a speech signal. An input speech signal is classified and an appropriate coding mode is selected based on this classification. For each classification, the coding mode that achieves the lowest bit rate with an acceptable quality of speech reproduction is selected. Low average bit rates are achieved by only employing high fidelity modes (i.e., high bit rate, broadly applicable to different types of speech) during portions of the speech where this fidelity is required for acceptable output. Lower bit rate modes are used during portions of speech where these modes produce acceptable output. Input speech signal is classified into active and inactive regions. Active regions are further classified into voiced, unvoiced, and transient regions. Various coding modes are applied to active speech, depending upon the required level of fidelity. Coding modes may be utilized according to the strengths and weaknesses of each particular mode. The apparatus dynamically switches between these modes as the properties of the speech signal vary with time. And where appropriate, regions of speech are modeled as pseudo-random noise, resulting in a significantly lower bit rate. This coding is used in a dynamic fashion whenever unvoiced speech or background noise is detected.
99 Citations
32 Claims
-
1. A method of encoding a speech signal comprising:
-
(a) classifying the speech signal as either active or inactive speech; (b) classifying said active speech into one of a plurality of types of active speech; (c) selecting an encoder mode from a plurality of encoder modes based on whether the speech signal is active or inactive, and if active, based further on said type of active speech, wherein said plurality of encoder modes comprises a code excited linear prediction (CELP) encoder mode, a prototype pitch period (PPP) encoder mode, and a noise excited linear prediction (NELP) encoder mode; and (d) encoding the speech signal according to said selected encoder mode to form an encoded speech signal. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. An apparatus comprising:
-
classification means for classifying a speech signal as active or inactive speech, and if active speech, for classifying the active speech as one of a plurality of types of active speech; and a plurality of encoding means for encoding the speech signal as an encoded speech signal, wherein said encoding means are dynamically selected to encode the speech signal based on whether the speech signal is active or inactive, and if active, based further on said type of active speech, wherein said plurality of encoder means comprises a code excited linear prediction (CELP) encoding means, a prototype pitch period (PPP) encoding means, and a noise excited linear prediction (NELP) encoding means. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. An apparatus comprising:
a classification module configured to classify a speech signal as active or inactive speech, and if active speech, to classify the active speech as one of a plurality of types of active speech; and
a plurality of encoders configured to encode the speech signal as an encoded speech signal, wherein said encoders are dynamically selected to encode the speech signal based on whether the speech signal is active or inactive, and if active, based further on said type of active speech, wherein said plurality of encoders comprises a code excited linear prediction (CELP) encoding means, a prototype pitch period (PPP) encoding means, and a noise excited linear prediction (NELP) encoding means.
Specification