Method and apparatus for context-dependent estimation of multiple probability distributions of phonetic classes with multilayer perceptrons in a speech recognition system
First Claim
1. In a speech recognition apparatus having a hidden Markov model speech recognizer, a method for using a multilayer perceptron (MLP) for recognizing speech by context-dependent estimation of a plurality of state-dependent observation probability distributions of phonetic (phone) classes which has weights that have been obtained based on a training set of speech vectors, wherein said training set of said speech vectors has been used to create context-dependent phone classes for use in said method, said speech vectors being characterized by phone classes, the method comprising the steps of:
- applying input speech vectors containing unknown data to a single input layer of a multilayer perceptron, said multilayer perceptron having a single input layer, a single hidden layer, a single set of weights between said input layer and said hidden layer, and a plurality of output layers with an associated plurality of sets of weights between said hidden layer and said output layers, each one of said output layers having a plurality of output units for storing a plurality of probability values;
forward propagating each input speech vector through said multilayer perceptron to produce an activation level representative of a probability value at each output unit within each one of said output layers;
determining likelihood of observing each said input speech vector, assuming a specific state of a hidden Markov model by factoring, according to Bayes rule, said likelihood of observing being in terms of posterior probabilities of phone classes of the speech vector assuming context and the input speech vector, thereby obtaining values representative of context-dependent estimation; and
employing as input to said hidden Markov model speech recognizer said values representative of context-dependent estimation as state-dependent observation probabilities to identify a specific estimated word sequence from said input speech vectors.
1 Assignment
0 Petitions
Accused Products
Abstract
In a hidden Markov model-based speech recognition system, multilayer perceptrons (MLPs) are used in context-dependent estimation of a plurality of state-dependent observation probability distributions of phonetic classes. Estimation is obtained by the Bayesian factorization of the observation likelihood in terms of posterior probabilities of phone classes assuming the context and the input speech vector. The context-dependent estimation is employed as the state-dependent observation probabilities needed as parameter input to a hidden Markov model speech processor to identify the word sequence representing the unknown speech input of input speech vectors. Within the speech processor, models are provided which employ the observation probabilities in the recognition process. The number of context-dependent nets is reduced to a single net by sharing the units of the input layer and the hidden layer and the weights connecting them in the multilayer perceptron while providing one output layer for each relevant context. Each output layer is trained as an independent network on the specific examples of the corresponding context it represents. Training may be optimized at an intermediate set of weights between the context-independent-associated weights and the context-dependent associated weights to which training would normally converge.
-
Citations
20 Claims
-
1. In a speech recognition apparatus having a hidden Markov model speech recognizer, a method for using a multilayer perceptron (MLP) for recognizing speech by context-dependent estimation of a plurality of state-dependent observation probability distributions of phonetic (phone) classes which has weights that have been obtained based on a training set of speech vectors, wherein said training set of said speech vectors has been used to create context-dependent phone classes for use in said method, said speech vectors being characterized by phone classes, the method comprising the steps of:
-
applying input speech vectors containing unknown data to a single input layer of a multilayer perceptron, said multilayer perceptron having a single input layer, a single hidden layer, a single set of weights between said input layer and said hidden layer, and a plurality of output layers with an associated plurality of sets of weights between said hidden layer and said output layers, each one of said output layers having a plurality of output units for storing a plurality of probability values; forward propagating each input speech vector through said multilayer perceptron to produce an activation level representative of a probability value at each output unit within each one of said output layers; determining likelihood of observing each said input speech vector, assuming a specific state of a hidden Markov model by factoring, according to Bayes rule, said likelihood of observing being in terms of posterior probabilities of phone classes of the speech vector assuming context and the input speech vector, thereby obtaining values representative of context-dependent estimation; and employing as input to said hidden Markov model speech recognizer said values representative of context-dependent estimation as state-dependent observation probabilities to identify a specific estimated word sequence from said input speech vectors. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A speech recognition apparatus comprising:
-
a hidden Markov model speech recognizer means; a multilayer perceptron means (MLP), said MLP comprising; a single input layer for receiving a plurality of input speech vectors from a source of speech vectors, a single hidden layer, a single set of weights between said input layer and said hidden layer, and a plurality of output layers with an associated plurality of sets of weights between said hidden layer and said output layers, each one of said output layers having a plurality of output units for storing a plurality of probability values; and means for forward propagating each input speech vector through said multilayer perceptron means to produce an activation level representative of a probability value at each output unit within each one of said output layers; means coupled to said MLP for determining likelihood of observing each speech vector assuming a specific state of a hidden Markov model by factoring, according to Bayes rule, said likelihood of observing being in terms of posterior probabilities of phone classes of the speech vector assuming context and the input speech vector, thereby obtaining values representative of context-dependent estimation; and wherein said hidden Markov model speech recognizer means employs said values representative of context-dependent estimation as state-dependent observation probabilities to identify a specific estimated word sequence from said input speech vectors, for recognizing speech by context-dependent estimation of a plurality of state-dependent observation probability distributions of phone classes which has weights that have been obtained based on a training set of speech vectors. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification