Minimum classification error training with growth transformation optimization

US 20080091424A1
Filed: 10/16/2006
Published: 04/17/2008
Est. Priority Date: 10/16/2006
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

setting parameter values for a set of acoustic models used in speech recognition;

for each of a set of utterances, decoding the utterance using the set of acoustic models to identify a set of competitor word sequences for the utterance and to determine a probability of each word sequence given the utterance;

setting a weight with a positive value that is different than one for at least one competitor word sequence of each utterance;

updating a parameter value in the set of acoustic models through steps comprising;

for each competitor word sequence for a selected utterance, using the weight for the word sequence, the probability of the word sequence given the selected utterance and an occupation probability that describes the probability of being in a particular state of the acoustic model at a particular time given the selected utterance and the word sequence to form a score for the word sequence;

summing the scores for the word sequences as part of forming a score for the selected utterance; and

using the score for the selected utterance to update a parameter value.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Hidden Markov Model (HMM) parameters are updated using update equations based on growth transformation optimization of a minimum classification error objective function. Using the list of N-best competitor word sequences obtained by decoding the training data with the current-iteration HMM parameters, the current HMM parameters are updated iteratively. The updating procedure involves using weights for each competitor word sequence that can take any positive real value. The updating procedure is further extended to the case where a decoded lattice of competitors is used. In this case, updating the model parameters relies on determining the probability for a state at a time point based on the word that spans the time point instead of the entire word sequence. This word-bound span of time is shorter than the duration of the entire word sequence and thus reduces the computing time.

Citations

20 Claims

1. A method comprising:
- setting parameter values for a set of acoustic models used in speech recognition;
  
  for each of a set of utterances, decoding the utterance using the set of acoustic models to identify a set of competitor word sequences for the utterance and to determine a probability of each word sequence given the utterance;
  
  setting a weight with a positive value that is different than one for at least one competitor word sequence of each utterance;
  
  updating a parameter value in the set of acoustic models through steps comprising;
  
  for each competitor word sequence for a selected utterance, using the weight for the word sequence, the probability of the word sequence given the selected utterance and an occupation probability that describes the probability of being in a particular state of the acoustic model at a particular time given the selected utterance and the word sequence to form a score for the word sequence;
  
  summing the scores for the word sequences as part of forming a score for the selected utterance; and
  
  using the score for the selected utterance to update a parameter value.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1 wherein updating a parameter value comprises using an update equation that is formed by using a growth-transformation optimization technique on a minimum classification error objective function.
  - 3. The method of claim 2 wherein using a growth-transformation optimization technique comprises defining the minimum classification error objective function as a rational function.
  - 4. The method of claim 1 wherein using the score for the selected utterance comprises multiplying the score for the selected utterance by a feature vector from the selected utterance associated with the particular time.
  - 5. The method of claim 1 wherein forming a score for a word sequence of the selected utterance further comprises:
    - for each utterance in the set of utterances, forming a sum of weighted probabilities, where each weighted probability comprises the weight of a word sequence decoded for the utterance multiplied by the probability of the word sequence given the utterance; and
      
      multiplying the sums of weighted probabilities together.
  - 6. The method of claim 5 wherein multiplying at least two of the sums of weighted probabilities together comprises forming a value that represents the product of all of the sums of weighted probabilities except for the sum of weighted probabilities associated with the selected utterance.
  - 7. The method of claim 1 wherein using the score for the selected utterance to update a parameter value comprises using the score for the selected utterance to update a mean and a variance associated with the particular state.

8. A method comprising:
- setting parameter values for a set of acoustic models used in speech recognition;
  
  for each of a set of utterances, decoding the utterance using the set of acoustic models to identify a set of competitor word sequences for the utterance and for each word sequence determining a probability of the utterance given the word sequence;
  
  updating a parameter value in the set of acoustic models through steps comprising;
  
  constructing a recognition lattice for each utterance, the recognition lattice comprising a set of arcs with one word for each arc;
  
  determining a probability of a selected state at a selected time point given a selected arc in a recognition lattice associated with a selected utterance;
  
  determining a joint probability of the selected arc and the selected utterance;
  
  determining a probability of the selected utterance; and
  
  using the probability of the selected state, the joint probability of the selected arc and the selected utterance and the probability of the selected utterance to compute a term used in updating a model parameter for the selected state.
- View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16)
- - 9. The method of claim 8 wherein the term is multiplied by a feature vector formed from the utterance and associated with the selected time point as part of updating the model parameter.
  - 10. The method of claim 9 wherein the probability of the selected state, the joint probability of the selected arc and the selected utterance and the probability of the selected utterance are used to compute a second term used in updating the model parameter and wherein the second term is multiplied by the model parameter as part of updating the model parameter.
  - 11. The method of claim 8 wherein determining the probability of the selected state comprises:
    - forming a state lattice of acoustic model states from the beginning of the arc to the end of the arc;
      
      determining a forward path score from the beginning of the arc to the selected state at the selected time point in the state lattice and backward path score from the end of the arc to the selected state at the selected time point in the state lattice; and
      
      using the forward path score and the backward path score to determine the probability of the selected state at the selected time point.
  - 12. The method of claim 8 wherein determining the joint probability comprises computing a forward path score from the start of the recognition lattice to the end of the arc and computing a backward path score from the end of the recognition lattice to the end of the arc.
  - 13. The method of claim 8 wherein determining the probability of the utterance comprises determining forward path scores from the beginning of the recognition lattice to the end of the recognition lattice and summing the forward path scores at the end of the recognition lattice to form the probability of the utterance.
  - 14. The method of claim 8 wherein determining a term further comprises determining the probability of a correct word sequence given the utterance.
  - 15. The method of claim 14 wherein determining the probability of the correct word sequence given the utterance comprises determining the language model probability of the correct word sequence, and using the language model probability, the probability of the utterance and the probability of the utterance given the correct word sequence as determined during decoding to determine the probability of the correct word sequence given the utterance.
  - 16. The method of claim 8 further comprising:
    - for each arc that spans the selected time point;
      
      determining a probability of the selected state given the arc and the selected utterance;
      
      determining a joint probability of the arc and the selected utterance;
      
      using the probability of the selected state given the arc and the selected utterance, the joint probability of the arc and the selected utterance and the probability of the utterance to form a value for the arc; and
      
      summing the values for each arc that spans the selected time point.

17. A computer-readable medium having computer-executable instructions for performing steps comprising:
- forming a recognition lattice of possible word sequences that were decoded from an utterance using an acoustic model;
  
  determining a probability for the utterance;
  
  for each arc in the recognition lattice that spans a selected time point;
  
  determining a probability of a selected state given the arc and the utterance;
  
  determining a joint probability of the arc and the utterance;
  
  using the probability of the state given the arc and the utterance, the joint probability of the arc and the utterance and the probability of the utterance to form a value for the arc; and
  
  using a sum of the values for the arcs that spans the selected time point as part of updating a model parameter for the selected state.
- View Dependent Claims (18, 19, 20)
- - 18. The computer-readable medium of claim 17 wherein determining a probability of a state given the arc and the utterance comprises:
    - forming a state lattice for states and time points spanned by the arc;
      
      determining a forward path score from the beginning of the arc to the state through the state lattice based on the portion of the utterance associated with the arc;
      
      determining a backward path score from the end of the arc to the state through the state lattice based on the portion of the utterance associated with the arc; and
      
      using the forward path score and the backward path score to determine the probability for the state.
  - 19. The computer-readable medium of claim 17 wherein updating the model parameters comprises applying the sum of the values for the arcs to an update equation determined by optimizing a minimum classification error objective function.
  - 20. The computer-readable medium of claim 17 wherein the sum of the values for the arcs is used in a first term and a second term, wherein the first term is multiplied by a feature vector for the utterance associated with the selected time point and the second term is multiplied by a mean associated with the state as part of forming an updated mean for the state.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Deng, Li, He, Xiaodong

Granted Patent

US 8,301,449 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/240
CPC Class Codes

G10L 15/063 Training

G10L 15/144 Training of HMMs

Minimum classification error training with growth transformation optimization

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Minimum classification error training with growth transformation optimization

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links