Generic framework for large-margin MCE training in speech recognition
First Claim
1. A method of training an acoustic model in a speech recognition system, comprising:
- utilizing a training corpus, having training tokens, to calculate an initial acoustic model;
computing, using the initial acoustic model, a plurality of scores for each training token with regard to a correct class and a plurality of competing classes;
calculating a sample-adaptive window bandwidth for each training token;
determining a value for a loss function based on the computed scores and the calculated sample-adaptive window bandwidth for each training token;
updating parameters in the current acoustic model to create a revised acoustic model based upon the loss value; and
outputting the revised acoustic model.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met.
-
Citations
20 Claims
-
1. A method of training an acoustic model in a speech recognition system, comprising:
-
utilizing a training corpus, having training tokens, to calculate an initial acoustic model; computing, using the initial acoustic model, a plurality of scores for each training token with regard to a correct class and a plurality of competing classes; calculating a sample-adaptive window bandwidth for each training token; determining a value for a loss function based on the computed scores and the calculated sample-adaptive window bandwidth for each training token; updating parameters in the current acoustic model to create a revised acoustic model based upon the loss value; and outputting the revised acoustic model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for training an acoustic model comprising:
-
a training corpus having training tokens; a training component; and wherein the training component is configured to generate the acoustic model based on the training corpus and a loss function that is calculated based on calculated scores of closeness and a calculated sample-adaptive window bandwidth for each training token. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A method comprising:
-
developing, from a Bayes risk minimization viewpoint, a generic framework for incorporating a margin into a differential kernel function; utilizing the developed generic framework for training an acoustic model in a speech recognition system; and outputting the trained acoustic model. - View Dependent Claims (18, 19, 20)
-
Specification