Minimum classification error training with growth transformation optimization
First Claim
1. A method comprising:
- setting parameter values for a set of acoustic models used in speech recognition;
for each of a set of utterances, decoding the utterance using the set of acoustic models to identify a set of competitor word sequences for the utterance and to determine a probability of each word sequence given the utterance;
setting a weight with a positive value that is different than one for at least one competitor word sequence of each utterance;
updating a parameter value in the set of acoustic models through steps comprising;
for each competitor word sequence for a selected utterance, using the weight for the word sequence, the probability of the word sequence given the selected utterance and an occupation probability that describes the probability of being in a particular state of the acoustic model at a particular time given the selected utterance and the word sequence to form a score for the word sequence;
summing the scores for the word sequences as part of forming a score for the selected utterance; and
using the score for the selected utterance to update a parameter value.
2 Assignments
0 Petitions
Accused Products
Abstract
Hidden Markov Model (HMM) parameters are updated using update equations based on growth transformation optimization of a minimum classification error objective function. Using the list of N-best competitor word sequences obtained by decoding the training data with the current-iteration HMM parameters, the current HMM parameters are updated iteratively. The updating procedure involves using weights for each competitor word sequence that can take any positive real value. The updating procedure is further extended to the case where a decoded lattice of competitors is used. In this case, updating the model parameters relies on determining the probability for a state at a time point based on the word that spans the time point instead of the entire word sequence. This word-bound span of time is shorter than the duration of the entire word sequence and thus reduces the computing time.
-
Citations
20 Claims
-
1. A method comprising:
-
setting parameter values for a set of acoustic models used in speech recognition; for each of a set of utterances, decoding the utterance using the set of acoustic models to identify a set of competitor word sequences for the utterance and to determine a probability of each word sequence given the utterance; setting a weight with a positive value that is different than one for at least one competitor word sequence of each utterance; updating a parameter value in the set of acoustic models through steps comprising; for each competitor word sequence for a selected utterance, using the weight for the word sequence, the probability of the word sequence given the selected utterance and an occupation probability that describes the probability of being in a particular state of the acoustic model at a particular time given the selected utterance and the word sequence to form a score for the word sequence; summing the scores for the word sequences as part of forming a score for the selected utterance; and using the score for the selected utterance to update a parameter value. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
setting parameter values for a set of acoustic models used in speech recognition; for each of a set of utterances, decoding the utterance using the set of acoustic models to identify a set of competitor word sequences for the utterance and for each word sequence determining a probability of the utterance given the word sequence; updating a parameter value in the set of acoustic models through steps comprising; constructing a recognition lattice for each utterance, the recognition lattice comprising a set of arcs with one word for each arc; determining a probability of a selected state at a selected time point given a selected arc in a recognition lattice associated with a selected utterance; determining a joint probability of the selected arc and the selected utterance; determining a probability of the selected utterance; and using the probability of the selected state, the joint probability of the selected arc and the selected utterance and the probability of the selected utterance to compute a term used in updating a model parameter for the selected state. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer-readable medium having computer-executable instructions for performing steps comprising:
-
forming a recognition lattice of possible word sequences that were decoded from an utterance using an acoustic model; determining a probability for the utterance; for each arc in the recognition lattice that spans a selected time point; determining a probability of a selected state given the arc and the utterance; determining a joint probability of the arc and the utterance; using the probability of the state given the arc and the utterance, the joint probability of the arc and the utterance and the probability of the utterance to form a value for the arc; and using a sum of the values for the arcs that spans the selected time point as part of updating a model parameter for the selected state. - View Dependent Claims (18, 19, 20)
-
Specification