Minimum classification error training with growth transformation optimization
First Claim
1. A method comprising:
- setting parameter values for a set of acoustic models used in speech recognition;
for each of a set of utterances, a decoder in a computing device decoding the utterance using the set of acoustic models to identify a set of competitor word sequences for the utterance and to determine a probability of each word sequence given the utterance;
setting a weight with a positive value that is different than one for at least one competitor word sequence of each utterance;
updating a parameter value in the set of acoustic models using a trainer through steps comprising;
for each competitor word sequence for a selected utterance, using the weight for the word sequence, the probability of the word sequence given the selected utterance and an occupation probability that describes the probability of being in a particular state of the acoustic model at a particular time given the selected utterance and the word sequence to form a score for the word sequence;
summing the scores for the word sequences as part of forming a score for the selected utterance, wherein forming the score for the selected utterance further comprises determining a term
2 Assignments
0 Petitions
Accused Products
Abstract
Hidden Markov Model (HMM) parameters are updated using update equations based on growth transformation optimization of a minimum classification error objective function. Using the list of N-best competitor word sequences obtained by decoding the training data with the current-iteration HMM parameters, the current HMM parameters are updated iteratively. The updating procedure involves using weights for each competitor word sequence that can take any positive real value. The updating procedure is further extended to the case where a decoded lattice of competitors is used. In this case, updating the model parameters relies on determining the probability for a state at a time point based on the word that spans the time point instead of the entire word sequence. This word-bound span of time is shorter than the duration of the entire word sequence and thus reduces the computing time.
40 Citations
17 Claims
-
1. A method comprising:
-
setting parameter values for a set of acoustic models used in speech recognition; for each of a set of utterances, a decoder in a computing device decoding the utterance using the set of acoustic models to identify a set of competitor word sequences for the utterance and to determine a probability of each word sequence given the utterance; setting a weight with a positive value that is different than one for at least one competitor word sequence of each utterance; updating a parameter value in the set of acoustic models using a trainer through steps comprising; for each competitor word sequence for a selected utterance, using the weight for the word sequence, the probability of the word sequence given the selected utterance and an occupation probability that describes the probability of being in a particular state of the acoustic model at a particular time given the selected utterance and the word sequence to form a score for the word sequence; summing the scores for the word sequences as part of forming a score for the selected utterance, wherein forming the score for the selected utterance further comprises determining a term - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method comprising:
-
setting parameter values for a set of acoustic models used in speech recognition; for each of a set of utterances, decoding the utterance using the set of acoustic models to identify a set of competitor word sequences for the utterance and for each word sequence determining a probability of the utterance given the word sequence; updating a parameter value in the set of acoustic models using a trainer in a computing device through steps comprising; constructing a recognition lattice for each utterance, the recognition lattice comprising a set of arcs with one word for each arc; for each arc that spans a selected time point; determining a probability, γ
i,r,c(t), of a selected state at the selected time point given the arc and a selected utterance using a first forward-backward recursion with the recursions starting at the beginning and the end of the arc;determining a joint probability, pΛ
′
(c,Xr), of the arc and the selected utterance using a second forward-backward recursion with the forward recursion starting at the beginning of the utterance and the backward recursion starting at the end of the utterance;determining a probability, pΛ
′
(Xr), of the selected utterance; and
using the probability of the selected state given the arc and the selected utterance, the joint probability of the arc and the selected utterance and the probability of the selected utterance to form a value for the arc; andsumming the values for the arcs that span the selected time point to compute a term - View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer storage medium having computer-executable instructions for performing steps comprising:
-
forming a recognition lattice of possible word sequences that were decoded from an utterance using an acoustic model; determining a probability, pΛ
′
(Xr), for the utterance;for each arc in the recognition lattice that spans a selected time point; determining a probability, γ
i,r,c(t), of a selected state given the arc and the utterance using a forward-backward recursion with the recursions starting at the beginning and the end of the arc;determining a joint probability, pΛ
′
(c,Xr), of the arc and the utterance using a forward-backward recursion with the forward recursion starting at the beginning of the lattice and the backward recursion starting at the end of the lattice;using the probability of the state given the arc and the utterance, the joint probability of the arc and the utterance and the probability of the utterance to form a value for the arc; and using a sum - View Dependent Claims (15, 16, 17)
-
Specification