Speech models generated using competitive training, asymmetric training, and data boosting
First Claim
Patent Images
1. A method of training a speech model, comprising:
- obtaining model parameters for the speech model;
processing a known speech input using the speech model with the model parameters to generate a process result;
calculating a distance between a true result and the process result, given the model parameters and the known speech input, the true result comprising a true transcription, the true transcription corresponding to only the following waveform states;
silence, noise, onset and speech, instead of a phonetic transcription; and
modifying the model parameters to reduce the distance between the true result and the process result, to obtain a modified model, wherein reducing the distance between the true result and the process result comprises maximizing a function comprising a parameter set for an acoustic model and a super utterance, the super utterance comprising a feature vector sequence, the true result and the process result.
1 Assignment
0 Petitions
Accused Products
Abstract
Speech models are trained using one or more of three different training systems. They include competitive training which reduces a distance between a recognized result and a true result, data boosting which divides and weights training data, and asymmetric training which trains different model components differently.
17 Citations
20 Claims
-
1. A method of training a speech model, comprising:
-
obtaining model parameters for the speech model; processing a known speech input using the speech model with the model parameters to generate a process result; calculating a distance between a true result and the process result, given the model parameters and the known speech input, the true result comprising a true transcription, the true transcription corresponding to only the following waveform states;
silence, noise, onset and speech, instead of a phonetic transcription; andmodifying the model parameters to reduce the distance between the true result and the process result, to obtain a modified model, wherein reducing the distance between the true result and the process result comprises maximizing a function comprising a parameter set for an acoustic model and a super utterance, the super utterance comprising a feature vector sequence, the true result and the process result. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-readable storage device with computer-executable instructions stored thereon which, when executed by a computer, perform a method for training a speech model, the method comprising:
-
obtaining model parameters for the speech model; processing a known speech input using the speech model with the model parameters to generate a process result; calculating a distance between a true result and the process result, given the model parameters and the known speech input, the true result comprising a true transcription, the true transcription corresponding to only the following waveform states;
silence, noise, onset and speech, instead of a phonetic transcription;modifying the model parameters to reduce the distance between the true result and the process result, to obtain a modified model, wherein reducing the distance between the true result and the process result comprises maximizing a function comprising a parameter set for an acoustic model and a super utterance, the super utterance comprising a feature vector sequence, the true result and the process result; and iterating on the steps of processing, calculating and modifying until the model parameters reach a desired convergence. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A method of training a speech model, comprising:
-
obtaining model parameters for the speech model, wherein the speech model comprises an acoustic model and wherein obtaining model parameters for the speech model comprises performing maximum likelihood training on training data to obtain an initial model with initial model parameters; processing a known speech input using the speech model with the model parameters to generate a process result, wherein processing a known speech input to generate a process result comprises performing speech recognition on acoustic data indicative of a known acoustic input, using the acoustic model with the model parameters, to generate a speech recognition result; calculating a distance between a true result and the process result, given the model parameters and the known speech input, the true result comprising a true transcription, the true transcription corresponding to only the following waveform states;
silence, noise, onset and speech, instead of a phonetic transcription;modifying the model parameters to reduce the distance between the true result and the process result, to obtain a modified model wherein reducing the distance between the true result and the process result comprises maximizing a logarithmic function comprising a parameter set for the acoustic model and a super utterance, the super utterance comprising a feature vector sequence, the true result and the process result; and iterating on the steps of processing, calculating and modifying until the model parameters reach a desired convergence. - View Dependent Claims (17, 18, 19, 20)
-
Specification