Method and apparatus for discriminative utterance verification using multiple confidence measures
First Claim
1. An automated speech recognition system comprising:
- a plurality of confidence measurement generating devices that generate a plurality of confidence measurements, at least one of the plurality of confidence measurements being a first type of confidence measurement and at least one of the plurality of confidence measurements being a second type of confidence measurement, where the first and second types of confidence measurements are different types of confidence measurements, the first and second types of confidence measurements corresponding to separate knowledge sources, wherein;
the automated speech recognition system inputs a signal representing an utterance to be recognized, the signal comprising at least one portion, andeach confidence measurement generating device inputs that signal and outputs at least one confidence measurement of at least one of at least the first and second types of confidence measurements for each portion of that signal;
a normalizing device that inputs the plurality of confidence measurements comprising at least the first and second types of confidence measurements and outputs a plurality of normalized confidence measurements of at least the first and second types for each portion of the utterance; and
an integrator that inputs, for each portion of the utterance, the plurality of normalized confidence measurements of at least the first and second types and outputs, based on the plurality of normalized confidence measurements of at least the first and second types for that portion of the utterance, a signal indicating whether that portion of the utterance has been correctly recognized.
4 Assignments
0 Petitions
Accused Products
Abstract
A multiple confidence measures subsystem of an automated speech recognition system allows otherwise independent confidence measures to be integrated and used for both training and testing on a consistent basis. Speech to be recognized is input to a speech recognizer and a recognition verifier of the multiple confidence measures subsystem. The speech recognizer generates one or more confidence measures. The speech recognizer preferably generates a misclassification error (MCE) distance as one of the confidence measures. The recognized speech output by the speech recognizer is input to the recognition verifier, which outputs one or more confidence measures. The recognition verifier preferably outputs a misverification error (MVE) distance as one of the confidence measures. The confidence measures output by the speech recognizer and the recognition verifier are normalized and then input to an integrator. The integrator integrates the various confidence measures during both a training phase for the hidden Markov models implemented in the speech recognizer and the recognition verifier and during testing of the input speech. The integrator is preferably implemented using a multi-layer perceptron (MLP). The output of the integrator, rather than the recognition verifier, determines whether the recognized utterance hypothesis generated by the speech recognizer should be accepted or rejected.
92 Citations
23 Claims
-
1. An automated speech recognition system comprising:
-
a plurality of confidence measurement generating devices that generate a plurality of confidence measurements, at least one of the plurality of confidence measurements being a first type of confidence measurement and at least one of the plurality of confidence measurements being a second type of confidence measurement, where the first and second types of confidence measurements are different types of confidence measurements, the first and second types of confidence measurements corresponding to separate knowledge sources, wherein; the automated speech recognition system inputs a signal representing an utterance to be recognized, the signal comprising at least one portion, and each confidence measurement generating device inputs that signal and outputs at least one confidence measurement of at least one of at least the first and second types of confidence measurements for each portion of that signal; a normalizing device that inputs the plurality of confidence measurements comprising at least the first and second types of confidence measurements and outputs a plurality of normalized confidence measurements of at least the first and second types for each portion of the utterance; and an integrator that inputs, for each portion of the utterance, the plurality of normalized confidence measurements of at least the first and second types and outputs, based on the plurality of normalized confidence measurements of at least the first and second types for that portion of the utterance, a signal indicating whether that portion of the utterance has been correctly recognized. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for automatically recognizing speech, comprising:
-
inputting a signal based on an utterance to be recognized, the signal having a plurality of portions; generating a proposed recognition for each portion; generating for each portion of the signal, a plurality of confidence measurements from the proposed recognition for that portion of the signal, comprising; generating at least one confidence measurement of a first type, and generating at least one confidence measurement of a second type, wherein at least the first and second types of confidence measurements are different types of confidence measurements that correspond to separate knowledge sources and the at least first and second types of confidence measurements are generated in parallel relative to the signal; normalizing, for each portion of the signal, the plurality of confidence measurements, comprising at least the first and second types of confidence measurements, for that portion of the signal; integrating, for each portion of the signal, the plurality of normalized confidence measurements of at least the first and second types to generate an integrated confidence measurement for that portion of the signal; and determining, for each portion of the signal, if the proposed recognition for that portion of the signal is acceptable based on the integrated confidence measurements of at least the first and second types. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. An automated speech recognition system comprising:
-
a plurality of confidence measurement generating devices, each confidence measurement generating device inputting a signal based on a portion of an utterance to be recognized and outputting at least one confidence measurement for the portion of the utterance, each confidence measurement corresponding to a quality of a recognition function performed by the corresponding confidence measurement generating device on the portion of the utterance, each recognition function having a set of parameter, the plurality of confidence measurement generating devices comprising; a speech recognizer, wherein the signal input by the speech recognizer comprises a spectral coefficient signal including a plurality of spectral coefficients determined from the portion of the utterance, the speech recognizer outputting, for the portion of the utterance, a recognition hypothesis and at least a minimum characterization error confidence measurement for the hypothesis, and a recognition verifier, wherein the signal input by the recognition verifier comprises at least the recognition hypothesis output by the speech recognizer, the recognition verifier outputting, for the portion of the speech, at least a minimum verification error confidence measurement for the recognition hypothesis; a normalizing device that inputs the plurality of confidence measurements from both the speech recognizer and the recognition verifier and outputs a plurality of normalized confidence measurements; and an integrator that inputs the plurality of normalized confidence measurements of both the speech recognizer and the recognition verifier and outputs a signal indicating whether the portion of the utterance has been correctly recognized, wherein, during a training phase, the signal output by the integrator is input to the plurality of confidence measurement generating devices, each confidence measurement generating device modifying the set of parameters of its recognition function based on the signal.
-
-
23. A method for automatically recognizing speech, comprising:
-
inputting a signal based on an utterance to be recognized, the signal having a plurality of portions; generating a proposed recognition for each portion, comprising generating a recognition hypothesis for each portion; generating, for each portion of the signal, a plurality of distinct confidence measurements from the proposed recognition based on a plurality of sets of parameters; normalizing the plurality of distinct confidence measurements for each portion; integrating the plurality of normalized distinct confidence measurements to determine, for each portion, if the proposed recognition is acceptable; and during a training phase, modifying, for each portion, at least one of the plurality of sets of parameters based on a correctness of the proposed recognition; wherein generating, for each portion of the signal, the plurality of distinct confidence measurements comprises; generating at least a minimum characterization error confidence measurement based on the recognition hypothesis, generating at least one alternative recognition hypothesis for the recognition hypothesis, and generating at least a minimum verification error confidence measurement based on the recognition hypothesis and the at least one alternative recognition hypothesis.
-
Specification