Generic framework for large-margin MCE training in speech recognition

US 8,423,364 B2
Filed: 02/20/2007
Issued: 04/16/2013
Est. Priority Date: 02/20/2007
Status: Active Grant

First Claim

Patent Images

1. A method of training an acoustic model in a speech recognition system, comprising:

utilizing a training corpus, having training tokens, to calculate an initial acoustic model;

computing, using the initial acoustic model, a plurality of scores for each training token with regard to a correct class and a plurality of competing classes;

utilizing a symmetric kernel function that is based on an exponent of the plurality of scores to calculate a sample-adaptive window bandwidth for each training token;

utilizing a loss function to calculate a margin for each training token;

gradually increasing the margin for each training token over a number of iterations until a minimum word error rate is achieved;

determining derivatives of the loss function based on the computed scores, based on the calculated sample-adaptive window bandwidth for each training token, and based on the iteratively increased margin for each training token;

calculating a Bayes risk value that includes a margin-free Bayes risk component and a margin-bound Bayes risk component, the margin-free Bayes risk component being based on an integral computed from zero to infinity, and the margin-bound Bayes risk component being based on an integral computed from a negative value of a discriminative margin to zero;

updating parameters in the initial acoustic model to create a revised acoustic model based upon the derivatives of the loss function and the Bayes Risk value; and

outputting the revised acoustic model.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met.

Citations

19 Claims

1. A method of training an acoustic model in a speech recognition system, comprising:
- utilizing a training corpus, having training tokens, to calculate an initial acoustic model;
  
  computing, using the initial acoustic model, a plurality of scores for each training token with regard to a correct class and a plurality of competing classes;
  
  utilizing a symmetric kernel function that is based on an exponent of the plurality of scores to calculate a sample-adaptive window bandwidth for each training token;
  
  utilizing a loss function to calculate a margin for each training token;
  
  gradually increasing the margin for each training token over a number of iterations until a minimum word error rate is achieved;
  
  determining derivatives of the loss function based on the computed scores, based on the calculated sample-adaptive window bandwidth for each training token, and based on the iteratively increased margin for each training token;
  
  calculating a Bayes risk value that includes a margin-free Bayes risk component and a margin-bound Bayes risk component, the margin-free Bayes risk component being based on an integral computed from zero to infinity, and the margin-bound Bayes risk component being based on an integral computed from a negative value of a discriminative margin to zero;
  
  updating parameters in the initial acoustic model to create a revised acoustic model based upon the derivatives of the loss function and the Bayes Risk value; and
  
  outputting the revised acoustic model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1 wherein deriving the loss function from a Bayesian viewpoint further comprises utilizing a margin-free Bayes risk function.
  - 3. The method of claim 2 wherein deriving the loss function from a Bayesian viewpoint further comprises incorporating a margin-bound Bayes risk function in addition to utilizing the margin-free Bayes risk function.
  - 4. The method of claim 1 wherein the margin for each training token increases with each iteration.
  - 5. The method of claim 4 and further comprising:
    - repeating the steps of computing, calculating, determining and updating until an empirical convergence has been met for the revised acoustic model.
  - 6. The method of claim 4 wherein the margin for each training token is initially set to zero.
  - 7. The method of claim 4 wherein the margin for each training token is initially set to a value greater than zero.
  - 8. The method of claim 4 wherein the margin for each training token is initially set to a value less than zero.

9. A system for training an acoustic model comprising:
- a training corpus having training tokens;
  
  a classifier that utilizes a zero-one risk function to calculate a cost of classifying each of the training tokens into an incorrect class, the zero-one risk function having a discriminant function and an anti-discriminant function that convert the costs for the training tokens from a feature domain to a score domain;
  
  enrollment data that includes an acoustic representation of a user voice input and a corresponding transcription of the user voice input;
  
  a Bayes risk value that includes a margin-free Bayes risk component and a margin-bound Bayes risk component, the margin-free Bayes risk component being based on an integral computed from zero to infinity, and the margin-bound Bayes risk component being based on an integral computed from a negative value of a discriminative margin to zero;
  
  a training component that utilizes a discriminative training algorithm to generate an acoustic model based on the training corpus, the enrollment data, the classifier, the Bayes risk value, and a loss function that is calculated based on calculated scores of closeness and a calculated sample-adaptive window bandwidth for each training token, the sample-adaptive window bandwidth for each training token being estimated utilizing a distribution of scores for the training tokens, the discriminative training algorithm optimizing an error rate associated with the training tokens based on a slope of the loss function that is adapted to each training token, and the acoustic model including Hidden Markov Model parameters that are updated based on derivatives of the loss function.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The system of claim 9 wherein the loss function is calculated based additionally on a margin, and wherein the loss function comprises a sigmoid loss function.
  - 11. The system of claim 9 wherein the training component is configured to generate a series of revised acoustic models until an empirical convergence is achieved, and wherein the loss function is based on an exponent of the sample-adaptive window bandwidth.
  - 12. The system of claim 9 further comprising:
    - speaker independent data;
      
      incrementally collected cohort data; and
      
      wherein the training component is configured to use the speaker independent data in conjunction with the training corpus, the incrementally corrected cohort data, and the enrollment data to generate the acoustic model.
  - 13. The system of claim 12 wherein the training component generates a custom acoustic model for each speaker.
  - 14. The system of claim 10 wherein the margin is a fixed value.
  - 15. The system of claim 10 wherein the margin is greater than zero.

16. A method comprising:
- calculating an initial acoustic model for a speech recognition system;
  
  developing, from a Bayes risk minimization viewpoint, a generic framework for incorporating a margin into a differential kernel function, the differential kernel function being a symmetric kernel function that is exponentially based on a misclassification score, the misclassification score being estimated utilizing a bandwidth of the differential kernel function in a score domain;
  
  deriving, using the developed generic framework, a loss function that incorporates the margin, the misclassification score, and the bandwidth of the differential kernel function in the score domain;
  
  calculating a Bayes risk value that includes a margin-free Bayes risk component and a margin-bound Bayes risk component, the margin-free Bayes risk component being based on an integral computed from zero to infinity, and the margin-bound Bayes risk component being based on an integral computed from a negative value of a discriminative margin to zero;
  
  utilizing the derived loss function and the Bayes risk value to update parameters of the initial acoustic model to create a revised acoustic model; and
  
  outputting the revised acoustic model.
- View Dependent Claims (17, 18, 19)
- - 17. The method of claim 16 wherein the initial acoustic model includes Hidden Markov Model parameters, wherein the revised acoustic model includes revised Hidden Markov Model parameters, wherein the margin-free Bayes risk component is calculated utilizing an equation,
    Σ
    - _j=1^CP(C_j)∫
      
      ₀^∞p_D_i(D|C_j)dD and wherein the margin-bound Bayes risk component is calculated utilizing an equation,
      Σ
      
      _j=1^CP(C_j)∫
      
      _−
      
      m⁰p_D_j(D|C_j)dD.
  - 18. The method of claim 16 wherein utilizing the derived loss function to update the parameters of the initial acoustic model comprises carrying out, using the derived loss function, an iterative training process in which the revised acoustic model is further refined until an empirical convergence is achieved, and wherein the derived loss function is calculated utilizing an equation,
  - 19. The speech recognition system of claim 18 wherein the margin of the loss function varies with each iteration of the training process.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Yu, Dong, Acero, Alejandro, Deng, Li, He, Xiaodong
Primary Examiner(s)
Dorvil, Richemond
Assistant Examiner(s)
ADESANYA, OLUJIMI A

Application Number

US11/708,440
Publication Number

US 20080201139A1
Time in Patent Office

2,247 Days
Field of Search

704/231, 704233-234, 704/240, 704/256
US Class Current

704/256
CPC Class Codes

G10L 15/063 Training

G10L 2015/0631 Creating reference template...

Generic framework for large-margin MCE training in speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Generic framework for large-margin MCE training in speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links