Enhancement of speech coding in background noise for low-rate speech coder
First Claim
1. In a method of low-bit-rate speech coding of input speech occurring in a noisy environment, for a system which employs linear predictive coding (LPC) analysis of input speech frames to generate reflection coefficients, conversion of the reflection coefficients to vectors representing spectral parameters of the input speech frames, and matching of the spectral parameter vectors against reference vectors of a vocabulary of codewords generated in a training sequence in order to select the corresponding index of an optimally matching codeword for transmission,the improvement comprising the steps of:
- selecting a set of at least two features which are characterized by a probability distribution which is not strongly affected in the noisy environment and which allow discrimination between voiced and unvoiced input speech, wherein said selected features include the feature of zero-crossing counts which are based on average noise energy;
measuring the selected features for input speech frames; and
using said feature measurements to make voiced/unvoiced speech decisions in order to select the voice/unvoiced excitation for speech synthesis in the receiver;
using noise estimates to update the reference vectors of the vocabulary of codewords, wherein new reference vectors are generated corresponding to said vocabulary of codewords in the noisy environment, said noise estimates including noise amplitude and noise reflection coefficients, wherein said noise estimate for speech frame I is performed only if the ith speech frame is unvoiced and more than a given number L of continuous unvoiced speech frames are accumulated, in order to prevent using voiced or unvoiced speech in the noise estimate.
1 Assignment
0 Petitions
Accused Products
Abstract
A speech coding system employs measurements of robust features of speech frames whose distribution are not strongly affected by noise/levels to make voicing decisions for input speech occurring in a noisy environment. Linear programing analysis of the robust features and respective weights are used to determine an optimum linear combination of these features. The input speech vectors are matched to a vocabulary of codewords in order to select the corresponding, optimally matching codeword. Adaptive vector quantization is used in which a vocabulary of words obtained in a quiet environment is updated based upon a noise estimate of a noisy environment in which the input speech occurs, and the "noisy" vocabulary is then searched for the best match with an input speech vector. The corresponding clean codeword index is then selected for transmission and for synthesis at the receiver end. The results are better spectral reproduction and significant intelligibility enhancement over prior coding approaches. Robust features found to allow robust voicing decisions include: low-band energy; zero-crossing counts adapted for noise level; AMDF ratio (speech periodicity) measure; low-pass filtered backward correlation; low-pass filtered forward correlation; inverse-filtered backward correlation; and inverse-filtered pitch prediction gain measure.
191 Citations
13 Claims
-
1. In a method of low-bit-rate speech coding of input speech occurring in a noisy environment, for a system which employs linear predictive coding (LPC) analysis of input speech frames to generate reflection coefficients, conversion of the reflection coefficients to vectors representing spectral parameters of the input speech frames, and matching of the spectral parameter vectors against reference vectors of a vocabulary of codewords generated in a training sequence in order to select the corresponding index of an optimally matching codeword for transmission,
the improvement comprising the steps of: -
selecting a set of at least two features which are characterized by a probability distribution which is not strongly affected in the noisy environment and which allow discrimination between voiced and unvoiced input speech, wherein said selected features include the feature of zero-crossing counts which are based on average noise energy; measuring the selected features for input speech frames; and using said feature measurements to make voiced/unvoiced speech decisions in order to select the voice/unvoiced excitation for speech synthesis in the receiver; using noise estimates to update the reference vectors of the vocabulary of codewords, wherein new reference vectors are generated corresponding to said vocabulary of codewords in the noisy environment, said noise estimates including noise amplitude and noise reflection coefficients, wherein said noise estimate for speech frame I is performed only if the ith speech frame is unvoiced and more than a given number L of continuous unvoiced speech frames are accumulated, in order to prevent using voiced or unvoiced speech in the noise estimate. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. In a method of low-bit-rate speech coding of input speech occurring in a noisy environment, for a system which employs linear predictive coding (LPC) analysis of input speech frames to generate reflection coefficients, conversion of the reflection coefficients to vectors representing spectral parameters of the input speech frames, and matching of the spectral parameter vectors against reference vectors of a vocabulary of codewords generated in a training sequence in order to select the corresponding index of an optimally matching codeword for transmission,
the improvement comprising the steps of: -
selecting a set of features which are characterized by a probability distribution which is not strongly affected in the noisy environment and which allow discrimination between voiced and unvoiced input speech; measuring the selected features for input speech frames; and using said feature measurements to make voiced/unvoiced speech decisions in order to select the voice/unvoiced excitation for speech synthesis in the receiver; using noise estimates to update the reference vectors of the vocabulary of codewords, wherein new reference vectors are generated corresponding to said vocabulary of codewords in the noisy environment, said noise estimates including noise amplitude and noise reflection coefficients, wherein said noise estimate for speech frame I is performed only if the ith speech frame is unvoiced and more than a given number L of continuous unvoiced speech frames are accumulated, in order to prevent using voiced or unvoiced speech in the noise estimate. - View Dependent Claims (13)
-
Specification