Generic framework for large-margin MCE training in speech recognition
First Claim
1. A method of training an acoustic model in a speech recognition system, comprising:
- utilizing a training corpus, having training tokens, to calculate an initial acoustic model;
computing, using the initial acoustic model, a plurality of scores for each training token with regard to a correct class and a plurality of competing classes;
utilizing a symmetric kernel function that is based on an exponent of the plurality of scores to calculate a sample-adaptive window bandwidth for each training token;
utilizing a loss function to calculate a margin for each training token;
gradually increasing the margin for each training token over a number of iterations until a minimum word error rate is achieved;
determining derivatives of the loss function based on the computed scores, based on the calculated sample-adaptive window bandwidth for each training token, and based on the iteratively increased margin for each training token;
calculating a Bayes risk value that includes a margin-free Bayes risk component and a margin-bound Bayes risk component, the margin-free Bayes risk component being based on an integral computed from zero to infinity, and the margin-bound Bayes risk component being based on an integral computed from a negative value of a discriminative margin to zero;
updating parameters in the initial acoustic model to create a revised acoustic model based upon the derivatives of the loss function and the Bayes Risk value; and
outputting the revised acoustic model.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for training an acoustic model are disclosed. A training corpus is accessed and converted into an initial acoustic model. Scores are calculated for a correct class and competitive classes, respectively, for each token given the initial acoustic model. Also, a sample-adaptive window bandwidth is calculated for each training token. From the calculated scores and the sample-adaptive window bandwidth values, loss values are calculated based on a loss function. The loss function, which may be derived from a Bayesian risk minimization viewpoint, can include a margin value that moves a decision boundary such that token-to-boundary distances for correct tokens that are near the decision boundary are maximized. The margin can either be a fixed margin or can vary monotonically as a function of algorithm iterations. The acoustic model is updated based on the calculated loss values. This process can be repeated until an empirical convergence is met.
-
Citations
19 Claims
-
1. A method of training an acoustic model in a speech recognition system, comprising:
-
utilizing a training corpus, having training tokens, to calculate an initial acoustic model; computing, using the initial acoustic model, a plurality of scores for each training token with regard to a correct class and a plurality of competing classes; utilizing a symmetric kernel function that is based on an exponent of the plurality of scores to calculate a sample-adaptive window bandwidth for each training token; utilizing a loss function to calculate a margin for each training token; gradually increasing the margin for each training token over a number of iterations until a minimum word error rate is achieved; determining derivatives of the loss function based on the computed scores, based on the calculated sample-adaptive window bandwidth for each training token, and based on the iteratively increased margin for each training token; calculating a Bayes risk value that includes a margin-free Bayes risk component and a margin-bound Bayes risk component, the margin-free Bayes risk component being based on an integral computed from zero to infinity, and the margin-bound Bayes risk component being based on an integral computed from a negative value of a discriminative margin to zero; updating parameters in the initial acoustic model to create a revised acoustic model based upon the derivatives of the loss function and the Bayes Risk value; and outputting the revised acoustic model. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for training an acoustic model comprising:
-
a training corpus having training tokens; a classifier that utilizes a zero-one risk function to calculate a cost of classifying each of the training tokens into an incorrect class, the zero-one risk function having a discriminant function and an anti-discriminant function that convert the costs for the training tokens from a feature domain to a score domain; enrollment data that includes an acoustic representation of a user voice input and a corresponding transcription of the user voice input; a Bayes risk value that includes a margin-free Bayes risk component and a margin-bound Bayes risk component, the margin-free Bayes risk component being based on an integral computed from zero to infinity, and the margin-bound Bayes risk component being based on an integral computed from a negative value of a discriminative margin to zero; a training component that utilizes a discriminative training algorithm to generate an acoustic model based on the training corpus, the enrollment data, the classifier, the Bayes risk value, and a loss function that is calculated based on calculated scores of closeness and a calculated sample-adaptive window bandwidth for each training token, the sample-adaptive window bandwidth for each training token being estimated utilizing a distribution of scores for the training tokens, the discriminative training algorithm optimizing an error rate associated with the training tokens based on a slope of the loss function that is adapted to each training token, and the acoustic model including Hidden Markov Model parameters that are updated based on derivatives of the loss function. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A method comprising:
-
calculating an initial acoustic model for a speech recognition system; developing, from a Bayes risk minimization viewpoint, a generic framework for incorporating a margin into a differential kernel function, the differential kernel function being a symmetric kernel function that is exponentially based on a misclassification score, the misclassification score being estimated utilizing a bandwidth of the differential kernel function in a score domain; deriving, using the developed generic framework, a loss function that incorporates the margin, the misclassification score, and the bandwidth of the differential kernel function in the score domain; calculating a Bayes risk value that includes a margin-free Bayes risk component and a margin-bound Bayes risk component, the margin-free Bayes risk component being based on an integral computed from zero to infinity, and the margin-bound Bayes risk component being based on an integral computed from a negative value of a discriminative margin to zero; utilizing the derived loss function and the Bayes risk value to update parameters of the initial acoustic model to create a revised acoustic model; and outputting the revised acoustic model. - View Dependent Claims (17, 18, 19)
-
Specification