Generic framework for large-margin MCE training in speech recognition

US 20080201139A1
Filed: 02/20/2007
Published: 08/21/2008
Est. Priority Date: 02/20/2007
Status: Active Grant

First Claim

Patent Images

1. A method of training an acoustic model in a speech recognition system, comprising:

utilizing a training corpus, having training tokens, to calculate an initial acoustic model;

computing, using the initial acoustic model, a plurality of scores for each training token with regard to a correct class and a plurality of competing classes;

calculating a sample-adaptive window bandwidth for each training token;

determining a value for a loss function based on the computed scores and the calculated sample-adaptive window bandwidth for each training token;

updating parameters in the current acoustic model to create a revised acoustic model based upon the loss value; and

outputting the revised acoustic model.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met.

Citations

20 Claims

1. A method of training an acoustic model in a speech recognition system, comprising:
- utilizing a training corpus, having training tokens, to calculate an initial acoustic model;
  
  computing, using the initial acoustic model, a plurality of scores for each training token with regard to a correct class and a plurality of competing classes;
  
  calculating a sample-adaptive window bandwidth for each training token;
  
  determining a value for a loss function based on the computed scores and the calculated sample-adaptive window bandwidth for each training token;
  
  updating parameters in the current acoustic model to create a revised acoustic model based upon the loss value; and
  
  outputting the revised acoustic model.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 and further comprising:
    - deriving the loss function from a Bayesian viewpoint.
  - 3. The method of claim 2 wherein deriving the loss function from a Bayesian viewpoint further comprises utilizing a margin-free Bayes risk function.
  - 4. The method of claim 3 wherein deriving the loss function from a Bayesian viewpoint further comprises incorporating a margin-bound Bayes risk function in addition to utilizing the margin-free Bayes risk function.
  - 5. The method of claim 1 wherein determining a value of a loss function is additionally based on a margin.
  - 6. The method of claim 5 and further comprising:
    - repeating the steps of computing, calculating, determining and updating until an empirical convergence has been met for the revised acoustic model.
  - 7. The method of claim 5 wherein the margin is a fixed value.
  - 8. The method of claim 5 wherein the margin is a fixed value greater than zero.
  - 9. The method of claim 6 wherein the margin varies with each iteration.

10. A system for training an acoustic model comprising:
- a training corpus having training tokens;
  
  a training component; and
  
  wherein the training component is configured to generate the acoustic model based on the training corpus and a loss function that is calculated based on calculated scores of closeness and a calculated sample-adaptive window bandwidth for each training token.
- View Dependent Claims (11, 12, 13, 14, 15, 16)
- - 11. The system of claim 10 wherein the loss function is calculated based additionally on a margin.
  - 12. The system of claim 10 wherein the training component is configured to generate a series of revised acoustic models until an empirical convergence is achieved.
  - 13. The system of claim 10 further comprising:
    - speaker independent data; and
      
      wherein the training component is configured to use the speaker independent data in conjunction with the training corpus to generate the acoustic model.
  - 14. The system of claim 13 wherein the training component generates a custom acoustic model for each speaker.
  - 15. The system of claim 11 wherein the margin is a fixed value.
  - 16. The system of claim 11 wherein the margin is greater than zero.

17. A method comprising:
- developing, from a Bayes risk minimization viewpoint, a generic framework for incorporating a margin into a differential kernel function;
  
  utilizing the developed generic framework for training an acoustic model in a speech recognition system; and
  
  outputting the trained acoustic model.
- View Dependent Claims (18, 19, 20)
- - 18. The method of claim 17 wherein utilizing the generic framework for training an acoustic model comprises:
    - deriving, using the developed generic framework, a loss function that incorporates the margin; and
      
      utilizing the derived loss function for training the acoustic model.
  - 19. The method of claim 18 wherein utilizing the derived loss function for training the acoustic model comprises carrying out, using the derived loss function, an iterative training process in which a first iteration involves the refining of a initial acoustic model and subsequent iterations further refine refined acoustic models from previous iterations.
  - 20. The speech recognition system of claim 19 wherein the margin of the loss function varies with at least some of the iterations.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Deng, Li, Yu, Dong, Acero, Alejandro, He, Xiaodong

Granted Patent

US 8,423,364 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/231
CPC Class Codes

G10L 15/063 Training

G10L 2015/0631 Creating reference template...

Generic framework for large-margin MCE training in speech recognition

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Generic framework for large-margin MCE training in speech recognition

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links