Speech recognition method
First Claim
1. A speech recognition method based on recognizing words, comprising the steps of:
- defining, for each word, a probabilistic model including (i) a plurality of states, (ii) at least one transition, each transition extending from a state to a state, (iii) a plurality of generated labels indicative of time between states, and (iv) probabilities of outputting each label in each of said transitions;
generating a first label string of said labels for each of said words from initial data thereof;
for each of said words, iteratively updating the probabilities of the corresponding probabilistic model, comprising the steps of;
(a) inputting a first label string into a corresponding probabilistic model;
(b) obtaining a first frequency of each of said labels being output at each of said transitions over the time in which the corresponding first label string is input into the corresponding probabilistic model;
(c) obtaining a second frequency of each of said states occurring over the time in which the corresponding first label string is inputted into the corresponding probabilistic model; and
(d) obtaining each of a plurality of new probabilities of said corresponding probabilistic model by dividing the corresponding first frequency by the corresponding second frequency;
storing the first and second frequencies obtained in the last step of said iterative updating;
determining which of said words require adaptation to recognize different speakers or the same speaker at different times;
generating, for each of said words requiring adaptation, a second label string from adaptation data comprising the probabilistic model of the word to be adapted;
obtaining, for each of said words requiring adaptation, a third frequency of each of said labels being outputted at each of said transitions over the time in which the corresponding second label string is inputted into the corresponding probabilistic model;
obtaining, for each of said words requiring adaptation, a fourth frequency of each of said states occurring over the time in which the corresponding second label string is outputted into the corresponding probabilistic model;
obtaining fifth frequencies by interpolation of the corresponding first and third frequencies;
obtaining sixth frequencies by interpolation of the corresponding second and third frequencies; and
obtaining adapted probabilities for said adaptation data by dividing the corresponding fifth frequency by the corresponding sixth frequency.
1 Assignment
0 Petitions
Accused Products
Abstract
Speaker adaptation which enables a person to use a Hidden Markov model type recognizer previously trained by another person or persons. During initial training, parameters of Markov models are calculated iteratively by, for example, using the Forward-Backward algorithm. Adapting the recognizer to a new speaker involves (a) storing and utilizing intermediate results or probabilistic frequencies of a last iteration of training parameters, and (b) calculating new parameters by computing a weighted sum of the probabilistic frequencies stored during training and frequencies obtained from adaptation data derived from known utterances of words made by the new speaker.
74 Citations
4 Claims
-
1. A speech recognition method based on recognizing words, comprising the steps of:
-
defining, for each word, a probabilistic model including (i) a plurality of states, (ii) at least one transition, each transition extending from a state to a state, (iii) a plurality of generated labels indicative of time between states, and (iv) probabilities of outputting each label in each of said transitions; generating a first label string of said labels for each of said words from initial data thereof; for each of said words, iteratively updating the probabilities of the corresponding probabilistic model, comprising the steps of; (a) inputting a first label string into a corresponding probabilistic model; (b) obtaining a first frequency of each of said labels being output at each of said transitions over the time in which the corresponding first label string is input into the corresponding probabilistic model; (c) obtaining a second frequency of each of said states occurring over the time in which the corresponding first label string is inputted into the corresponding probabilistic model; and (d) obtaining each of a plurality of new probabilities of said corresponding probabilistic model by dividing the corresponding first frequency by the corresponding second frequency; storing the first and second frequencies obtained in the last step of said iterative updating; determining which of said words require adaptation to recognize different speakers or the same speaker at different times; generating, for each of said words requiring adaptation, a second label string from adaptation data comprising the probabilistic model of the word to be adapted; obtaining, for each of said words requiring adaptation, a third frequency of each of said labels being outputted at each of said transitions over the time in which the corresponding second label string is inputted into the corresponding probabilistic model; obtaining, for each of said words requiring adaptation, a fourth frequency of each of said states occurring over the time in which the corresponding second label string is outputted into the corresponding probabilistic model; obtaining fifth frequencies by interpolation of the corresponding first and third frequencies; obtaining sixth frequencies by interpolation of the corresponding second and third frequencies; and obtaining adapted probabilities for said adaptation data by dividing the corresponding fifth frequency by the corresponding sixth frequency. - View Dependent Claims (2, 3, 4)
-
Specification