Speech recognition system employing discriminatively trained models
First Claim
1. A method for a speech recognition system with word models having descriptive parameters and associated continuous probability density functions (PDFs) to dynamically adjust the word model descriptive parameters, the method comprising:
- a. converting an input utterance into a sequence of representative vectors;
b. comparing the sequence of representative vectors with a plurality of word model state sequences and using the continuous PDFs to score each word model state sequence for a likelihood that such state sequence represents the sequence of representative vectors;
c. selecting the word model state sequence having the best score as a recognition result for output to a user;
d. automatically performing a discriminative adjustment to the descriptive parameters of the best scoring word model state sequence and the descriptive parameters of at least one inferior scoring word model state sequence; and
e. if the user corrects the recognition result by selecting a different word sequence, i. automatically performing an adjustment to the descriptive parameters modified in step (d) that substantially undoes the discriminative adjustment performed in step (d), and ii. automatically performing a discriminative adjustment to the descriptive parameters of the word model state sequences for the words in the user corrected word sequence and the descriptive parameters of at least one other word model state sequence.
10 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition system has vocabulary word models having for each word model state both a discrete probability distribution function and a continuous probability distribution function. Word models are initially aligned with an input utterance using the discrete probability distribution functions, and an initial matching performed. From well scoring word models, a ranked scoring of those models is generated using the respective continuous probability distribution functions. After each utterance, preselected continuous probability distribution function parameters are discriminatively adjusted to increase the difference in scoring between the best scoring and the next ranking models.
In the event a user subsequently corrects a prior recognition event by selecting a different word model from that generated by the recognition system, a re-adjustment of the continuous probability distribution function parameters is performed by adjusting the current state of the parameters opposite to the adjustment performed with the original recognition event, and adjusting the current parameters to that which would have been performed if the user correction associated word had been the best scoring model.
265 Citations
9 Claims
-
1. A method for a speech recognition system with word models having descriptive parameters and associated continuous probability density functions (PDFs) to dynamically adjust the word model descriptive parameters, the method comprising:
-
a. converting an input utterance into a sequence of representative vectors;
b. comparing the sequence of representative vectors with a plurality of word model state sequences and using the continuous PDFs to score each word model state sequence for a likelihood that such state sequence represents the sequence of representative vectors;
c. selecting the word model state sequence having the best score as a recognition result for output to a user;
d. automatically performing a discriminative adjustment to the descriptive parameters of the best scoring word model state sequence and the descriptive parameters of at least one inferior scoring word model state sequence; and
e. if the user corrects the recognition result by selecting a different word sequence, i. automatically performing an adjustment to the descriptive parameters modified in step (d) that substantially undoes the discriminative adjustment performed in step (d), and ii. automatically performing a discriminative adjustment to the descriptive parameters of the word model state sequences for the words in the user corrected word sequence and the descriptive parameters of at least one other word model state sequence. - View Dependent Claims (2, 3, 4)
-
-
5. A method for a speech recognition system to convert an input utterance into a representative word sequence text, the method comprising:
-
a. converting the input utterance into a sequence of representative vectors;
b. quantizing the sequence of representative vectors into a sequence of standard prototype vectors;
c. using discrete probability distribution functions (PDFs) of vocabulary word models to generate an alignment of the sequence of standard prototype vectors with a plurality of word model state sequences and to calculate initial match scores representative of a likelihood that a given word model state sequence alignment represents the sequence of standard prototype vectors;
d. while retaining the alignment established in step (c), rescoring word model state sequences having an initial match score within a selected threshold value of the word model state sequence having the best score by comparing the word model state sequences to be rescored with the sequence of representative vectors using continuous PDFs of the word models; and
e. selecting the word model state sequence having the best rescore as a recognition result for output to a user. - View Dependent Claims (6, 7, 8, 9)
f. automatically performing a discriminative adjustment to descriptive parameters of the best rescored word model state sequence and the descriptive parameters of an inferior scoring word model state sequence; and
g. if the user corrects the recognition result by selecting a different word sequence, i. automatically performing an adjustment to the descriptive parameters modified in step (f) that substantially undoes the discriminative adjustment performed in step (f), and ii. automatically performing a discriminative adjustment to the descriptive parameters of the word model state sequences for the words in the user corrected word sequence and the descriptive parameters of at least one other word model state sequence.
-
-
7. A method as in claim 6, wherein in step (f) the at least one inferior scoring word model state sequence is the word model state sequence having the second best score.
-
8. A method as in claim 6, wherein in step (g)(ii) the at least one other word model state sequence is the word model state sequence having the next best score to the word model state sequence of the user corrected word sequence.
-
9. A method as in claim 5, wherein the discriminative adjustment uses a gradient descent technique.
Specification