Nonlinear mapping for feature extraction in automatic speech recognition

US 7,254,538 B1
Filed: 11/16/2000
Issued: 08/07/2007
Est. Priority Date: 11/16/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A method of combining neural-net discriminative feature processing with Gaussian-mixture distribution modeling in automatic speech recognition comprising:

training at least one neural network to estimate a plurality of phone posterior probabilities from at least a portion of an audio stream containing speech;

transforming the distribution of the plurality of posterior probabilities into a Gaussian distribution;

de-correlating the transformed posterior probabilities; and

applying the de-correlated and transformed posterior probabilities as features to a Gaussian mixture model automatic speech recognition system.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention successfully combines neural-net discriminative feature processing with Gaussian-mixture distribution modeling (GMM). By training one or more neural networks to generate subword probability posteriors, then using transformations of these estimates as the base features for a conventionally-trained Gaussian-mixture based system, substantial error rate reductions may be achieved. The present invention effectively has two acoustic models in tandem—first a neural net and then a GMM. By using a variety of combination schemes available for connectionist models, various systems based upon multiple features streams can be constructed with even greater error rate reductions.

Citations

20 Claims

1. A method of combining neural-net discriminative feature processing with Gaussian-mixture distribution modeling in automatic speech recognition comprising:
- training at least one neural network to estimate a plurality of phone posterior probabilities from at least a portion of an audio stream containing speech;
  
  transforming the distribution of the plurality of posterior probabilities into a Gaussian distribution;
  
  de-correlating the transformed posterior probabilities; and
  
  applying the de-correlated and transformed posterior probabilities as features to a Gaussian mixture model automatic speech recognition system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 wherein the neural network is a multilayer perceptron based phone classifier.
  - 3. The method of claim 1 wherein the at least a portion of an audio stream comprises at least one critical band of frequencies.
  - 4. The method of claim 1 wherein the transforming comprises taking the logarithm of the posterior probabilities.
  - 5. The method of claim 1 wherein the transforming comprises bypassing an output layer of the neural network wherein the output layer comprises softmax non-linearity.
  - 6. The method of claim 1 wherein the de-correlating comprises application of a Karhunen-Loeve projection.
  - 7. The method of claim 1 wherein the neural network is trained from phonetically hand labeled data.
  - 8. The method of claim 1 in which the automatic speech recognition system comprises a hidden Markov model.

9. A computer program product, stored on a computer readable medium, comprising instructions operable to cause a programmable processor to:
- receive a plurality of subword posterior probabilities from at least one neural network trained to estimate subword posterior probabilities from at least a portion of an audio stream;
  
  transform a distribution of the plurality of posterior probabilities into a Gaussian distribution;
  
  de-correlate the transformed posterior probabilities; and
  
  supply the de-correlated and transformed posterior probabilities as features to a Gaussian mixture model speech recognition system.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The computer program product of claim 9 wherein the at least one neural network is a multilayer perceptron based phone classifier.
  - 11. The computer program product of claim 9 wherein the at least a portion of an audio stream comprises at least one critical band of frequencies.
  - 12. The computer program product of claim 9 wherein the transformation of the distribution comprises taking the logarithm of the posterior probabilities.
  - 13. The computer program product of claim 9 wherein the transformation comprises bypassing an output layer of the neural network.
  - 14. The computer program product of claim 9 wherein the de-correlation comprises application of a Karhunen-Loeve projection.
  - 15. The computer program product of claim 9 wherein the neural network is trained from phonetically hand labeled data.
  - 16. The computer program product of claim 9 wherein the automatic speech recognition system comprises a hidden Markov model.

17. A method of using neural-net discriminative feature processing with Gaussian-mixture distribution modeling for use in automatic speech recognition comprising:
- training a first plurality of neural networks to generate a set of pluralities of subword posterior probabilities from at least portions of an audio stream;
  
  non-linearly merging the set of pluralities of posterior probabilities into a merged plurality of posterior probabilities using a second neural network;
  
  transforming the distribution of the merged plurality of posterior probabilities into a Gaussian distribution;
  
  de-correlating the transformed merged plurality of posterior probabilities; and
  
  applying the de-correlated and transformed merged plurality of posterior probabilities as features to an automatic speech recognition system.
- View Dependent Claims (18, 19, 20)
- - 18. The method of claim 17 wherein the audio speech stream is separated into a plurality of frequency bands, and wherein each individual frequency band is provided as input to one of the plurality of neural networks.
  - 19. The method of claim 17 wherein the input to the first plurality of neural networks comprises syllable length temporal vectors of logarithmic energies from the audio stream.
  - 20. The method of claim 17 wherein the automatic speech recognition system comprises a hidden Markov model.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Computer Science Institute
Original Assignee
International Computer Science Institute
Inventors
Sharma, Sangita, Hermansky, Hynek, Ellis, Daniel
Primary Examiner(s)
ARMSTRONG, ANGELA A

Application Number

US09/714,806
Time in Patent Office

2,455 Days
Field of Search

704201-203, 704/232, 704236-237, 704/240, 704/256, 704/256.1
US Class Current

704/256.1
CPC Class Codes

G10L 15/144 Training of HMMs

Nonlinear mapping for feature extraction in automatic speech recognition

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Nonlinear mapping for feature extraction in automatic speech recognition

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links