Nonlinear mapping for feature extraction in automatic speech recognition
First Claim
1. A method of combining neural-net discriminative feature processing with Gaussian-mixture distribution modeling in automatic speech recognition comprising:
- training at least one neural network to estimate a plurality of phone posterior probabilities from at least a portion of an audio stream containing speech;
transforming the distribution of the plurality of posterior probabilities into a Gaussian distribution;
de-correlating the transformed posterior probabilities; and
applying the de-correlated and transformed posterior probabilities as features to a Gaussian mixture model automatic speech recognition system.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention successfully combines neural-net discriminative feature processing with Gaussian-mixture distribution modeling (GMM). By training one or more neural networks to generate subword probability posteriors, then using transformations of these estimates as the base features for a conventionally-trained Gaussian-mixture based system, substantial error rate reductions may be achieved. The present invention effectively has two acoustic models in tandem—first a neural net and then a GMM. By using a variety of combination schemes available for connectionist models, various systems based upon multiple features streams can be constructed with even greater error rate reductions.
-
Citations
20 Claims
-
1. A method of combining neural-net discriminative feature processing with Gaussian-mixture distribution modeling in automatic speech recognition comprising:
-
training at least one neural network to estimate a plurality of phone posterior probabilities from at least a portion of an audio stream containing speech; transforming the distribution of the plurality of posterior probabilities into a Gaussian distribution; de-correlating the transformed posterior probabilities; and applying the de-correlated and transformed posterior probabilities as features to a Gaussian mixture model automatic speech recognition system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer program product, stored on a computer readable medium, comprising instructions operable to cause a programmable processor to:
-
receive a plurality of subword posterior probabilities from at least one neural network trained to estimate subword posterior probabilities from at least a portion of an audio stream; transform a distribution of the plurality of posterior probabilities into a Gaussian distribution; de-correlate the transformed posterior probabilities; and supply the de-correlated and transformed posterior probabilities as features to a Gaussian mixture model speech recognition system. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A method of using neural-net discriminative feature processing with Gaussian-mixture distribution modeling for use in automatic speech recognition comprising:
-
training a first plurality of neural networks to generate a set of pluralities of subword posterior probabilities from at least portions of an audio stream; non-linearly merging the set of pluralities of posterior probabilities into a merged plurality of posterior probabilities using a second neural network; transforming the distribution of the merged plurality of posterior probabilities into a Gaussian distribution; de-correlating the transformed merged plurality of posterior probabilities; and applying the de-correlated and transformed merged plurality of posterior probabilities as features to an automatic speech recognition system. - View Dependent Claims (18, 19, 20)
-
Specification