Minimum classification error training with growth transformation optimization

US 8,301,449 B2
Filed: 10/16/2006
Issued: 10/30/2012
Est. Priority Date: 10/16/2006
Status: Expired due to Fees

First Claim

Patent Images

1. A method comprising:

setting parameter values for a set of acoustic models used in speech recognition;

for each of a set of utterances, a decoder in a computing device decoding the utterance using the set of acoustic models to identify a set of competitor word sequences for the utterance and to determine a probability of each word sequence given the utterance;

setting a weight with a positive value that is different than one for at least one competitor word sequence of each utterance;

updating a parameter value in the set of acoustic models using a trainer through steps comprising;

for each competitor word sequence for a selected utterance, using the weight for the word sequence, the probability of the word sequence given the selected utterance and an occupation probability that describes the probability of being in a particular state of the acoustic model at a particular time given the selected utterance and the word sequence to form a score for the word sequence;

summing the scores for the word sequences as part of forming a score for the selected utterance, wherein forming the score for the selected utterance further comprises determining a term

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Hidden Markov Model (HMM) parameters are updated using update equations based on growth transformation optimization of a minimum classification error objective function. Using the list of N-best competitor word sequences obtained by decoding the training data with the current-iteration HMM parameters, the current HMM parameters are updated iteratively. The updating procedure involves using weights for each competitor word sequence that can take any positive real value. The updating procedure is further extended to the case where a decoded lattice of competitors is used. In this case, updating the model parameters relies on determining the probability for a state at a time point based on the word that spans the time point instead of the entire word sequence. This word-bound span of time is shorter than the duration of the entire word sequence and thus reduces the computing time.

40 Citations

View as Search Results

17 Claims

1. A method comprising:
- setting parameter values for a set of acoustic models used in speech recognition;
  
  for each of a set of utterances, a decoder in a computing device decoding the utterance using the set of acoustic models to identify a set of competitor word sequences for the utterance and to determine a probability of each word sequence given the utterance;
  
  setting a weight with a positive value that is different than one for at least one competitor word sequence of each utterance;
  
  updating a parameter value in the set of acoustic models using a trainer through steps comprising;
  
  for each competitor word sequence for a selected utterance, using the weight for the word sequence, the probability of the word sequence given the selected utterance and an occupation probability that describes the probability of being in a particular state of the acoustic model at a particular time given the selected utterance and the word sequence to form a score for the word sequence;
  
  summing the scores for the word sequences as part of forming a score for the selected utterance, wherein forming the score for the selected utterance further comprises determining a term
- View Dependent Claims (2, 3, 4, 5)
- - 2. The method of claim 1 wherein updating a parameter value comprises using an update equation that is formed by using a growth-transformation optimization technique on a minimum classification error objective function.
  - 3. The method of claim 2 wherein using a growth-transformation optimization technique comprises defining the minimum classification error objective function as a rational function.
  - 4. The method of claim 1 wherein using the score for the selected utterance comprises multiplying the score for the selected utterance by a feature vector from the selected utterance associated with the particular time.
  - 5. The method of claim 1 wherein using the score for the selected utterance to update a parameter value comprises using the score for the selected utterance to update a mean and a variance associated with the particular state.

6. A method comprising:
- setting parameter values for a set of acoustic models used in speech recognition;
  
  for each of a set of utterances, decoding the utterance using the set of acoustic models to identify a set of competitor word sequences for the utterance and for each word sequence determining a probability of the utterance given the word sequence;
  
  updating a parameter value in the set of acoustic models using a trainer in a computing device through steps comprising;
  
  constructing a recognition lattice for each utterance, the recognition lattice comprising a set of arcs with one word for each arc;
  
  for each arc that spans a selected time point;
  
  determining a probability, γ
  
  _i,r,c(t), of a selected state at the selected time point given the arc and a selected utterance using a first forward-backward recursion with the recursions starting at the beginning and the end of the arc;
  
  determining a joint probability, p_Λ
  
  ′(c,X_r), of the arc and the selected utterance using a second forward-backward recursion with the forward recursion starting at the beginning of the utterance and the backward recursion starting at the end of the utterance;
  
  determining a probability, p_Λ
  
  ′(X_r), of the selected utterance; and
  
  using the probability of the selected state given the arc and the selected utterance, the joint probability of the arc and the selected utterance and the probability of the selected utterance to form a value for the arc; and
  
  summing the values for the arcs that span the selected time point to compute a term
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
- - 7. The method of claim 6 wherein the term is multiplied by a feature vector formed from the utterance and associated with the selected time point as part of updating the model parameter.
  - 8. The method of claim 7 wherein the probability of the selected state, the joint probability of the selected arc and the selected utterance and the probability of the selected utterance are used to compute a second term used in updating the model parameter and wherein the second term is multiplied by the model parameter as part of updating the model parameter.
  - 9. The method of claim 6 wherein determining the probability of the selected state comprises:
    - forming a state lattice of acoustic model states from the beginning of the arc to the end of the arc;
      
      determining a forward path score from the beginning of the arc to the selected state at the selected time point in the state lattice and backward path score from the end of the arc to the selected state at the selected time point in the state lattice; and
      
      using the forward path score and the backward path score to determine the probability of the selected state at the selected time point.
  - 10. The method of claim 6 wherein determining the joint probability comprises computing a forward path score from the start of the recognition lattice to the end of the arc and computing a backward path score from the end of the recognition lattice to the end of the arc.
  - 11. The method of claim 6 wherein determining the probability of the utterance comprises determining forward path scores from the beginning of the recognition lattice to the end of the recognition lattice and summing the forward path scores at the end of the recognition lattice to form the probability of the utterance.
  - 12. The method of claim 6 wherein determining a term further comprises determining the probability of a correct word sequence given the utterance.
  - 13. The method of claim 12 wherein determining the probability of the correct word sequence given the utterance comprises determining the language model probability of the correct word sequence, and using the language model probability, the probability of the utterance and the probability of the utterance given the correct word sequence as determined during decoding to determine the probability of the correct word sequence given the utterance.

14. A computer storage medium having computer-executable instructions for performing steps comprising:
- forming a recognition lattice of possible word sequences that were decoded from an utterance using an acoustic model;
  
  determining a probability, p_Λ
  
  ′(X_r), for the utterance;
  
  for each arc in the recognition lattice that spans a selected time point;
  
  determining a probability, γ
  
  _i,r,c(t), of a selected state given the arc and the utterance using a forward-backward recursion with the recursions starting at the beginning and the end of the arc;
  
  determining a joint probability, p_Λ
  
  ′(c,X_r), of the arc and the utterance using a forward-backward recursion with the forward recursion starting at the beginning of the lattice and the backward recursion starting at the end of the lattice;
  
  using the probability of the state given the arc and the utterance, the joint probability of the arc and the utterance and the probability of the utterance to form a value for the arc; and
  
  using a sum
- View Dependent Claims (15, 16, 17)
- - 15. The computer storage medium of claim 14 wherein determining a probability of a state given the arc and the utterance comprises:
    - forming a state lattice for states and time points spanned by the arc;
      
      determining a forward path score from the beginning of the arc to the state through the state lattice based on the portion of the utterance associated with the arc;
      
      determining a backward path score from the end of the arc to the state through the state lattice based on the portion of the utterance associated with the arc; and
      
      using the forward path score and the backward path score to determine the probability for the state.
  - 16. The computer storage medium of claim 14 wherein updating the model parameters comprises applying the sum of the values for the arcs to an update equation determined by optimizing a minimum classification error objective function.
  - 17. The computer storage medium of claim 14 wherein the sum of the values for the arcs is used in a first term and a second term, wherein the first term is multiplied by a feature vector for the utterance associated with the selected time point and the second term is multiplied by a mean associated with the state as part of forming an updated mean for the state.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
He, Xiaodong, Deng, Li
Primary Examiner(s)
Godbold, Douglas
Assistant Examiner(s)
Villena, Mark

Application Number

US11/581,673
Publication Number

US 20080091424A1
Time in Patent Office

2,206 Days
Field of Search

704/9, 704/256.8, 704/251, 704/257, 704/256.7, 704/256.4, 704/256, 704/244, 704/243, 704/240
US Class Current

704/257
CPC Class Codes

G10L 15/063 Training

G10L 15/144 Training of HMMs

Minimum classification error training with growth transformation optimization

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

40 Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Minimum classification error training with growth transformation optimization

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

40 Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links