System and method for speech recognition using dynamically adjusted confidence measure
First Claim
1. A computer-implemented method of recognizing an input speech utterance, comprising:
- receiving the input speech utterance;
comparing the input speech utterance with a plurality of stored acoustic utterance models, each stored acoustic utterance model corresponding to a linguistic expression, the comparing including an output probability analysis and a transition probability analysis of the input speech utterance with respect to each of the stored acoustic utterance models and resulting in a constrained acoustic score for each of the stored acoustic utterance models;
for each stored acoustic utterance model, computing a confidence measure based on the constrained acoustic score for the stored acoustic utterance model, the confidence measure reflecting a probability that the constrained acoustic score correctly reflects how accurately the stored acoustic utterance model matches the input speech utterance;
for each stored acoustic utterance model, computing a combined score based on the constrained acoustic score and the confidence measure computed for the stored acoustic utterance model; and
determining which of the linguistic expressions corresponding to the stored acoustic utterance models most closely matches the input speech utterance based on the combined scores, thereby recognizing the input speech utterance.
3 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method of recognizing an input speech utterance compares the input speech utterance to a plurality of hidden Markov models to obtain a constrained acoustic score that reflects the probability that the hidden Markov model matches the input speech utterance. The method computes a confidence measure for each hidden Markov model that reflects the probability of the constrained acoustic score being correct. The computed confidence measure is then used to adjust the constrained acoustic score. Preferably, the confidence measure is computed based on a difference between the constrained acoustic score and an unconstrained acoustic score that is computed independently of any language context. In addition, a new confidence measure preferably is computed for each input speech frame from the input speech utterance so that the constrained acoustic score is adjusted for each input speech frame.
127 Citations
30 Claims
-
1. A computer-implemented method of recognizing an input speech utterance, comprising:
-
receiving the input speech utterance; comparing the input speech utterance with a plurality of stored acoustic utterance models, each stored acoustic utterance model corresponding to a linguistic expression, the comparing including an output probability analysis and a transition probability analysis of the input speech utterance with respect to each of the stored acoustic utterance models and resulting in a constrained acoustic score for each of the stored acoustic utterance models; for each stored acoustic utterance model, computing a confidence measure based on the constrained acoustic score for the stored acoustic utterance model, the confidence measure reflecting a probability that the constrained acoustic score correctly reflects how accurately the stored acoustic utterance model matches the input speech utterance; for each stored acoustic utterance model, computing a combined score based on the constrained acoustic score and the confidence measure computed for the stored acoustic utterance model; and determining which of the linguistic expressions corresponding to the stored acoustic utterance models most closely matches the input speech utterance based on the combined scores, thereby recognizing the input speech utterance. - View Dependent Claims (2, 3, 4, 5, 6, 14, 15, 16)
-
-
7. A computer-implemented method of recognizing an input speech utterance that includes a sequence of input speech frames, the method comprising:
-
receiving the sequence of input speech frames; for each input speech frame; comparing the input speech frame to a plurality of acoustic hidden Markov models, each hidden Markov model corresponding to a linguistic expression, the comparing including an output probability analysis and a transition probability analysis of the input speech frame with respect to each of the hidden Markov models and resulting in a constrained acoustic scored for each of the hidden Markov models; and for each hidden Markov model; computing a confidence measure based on the constrained acoustic score for the hidden Markov model, the confidence measure reflecting a probability that the constrained acoustic score correctly reflects how accurately the hidden Markov model matches the input speech frame; adjusting the constrained acoustic score for the hidden Markov model based on the confidence measure computed for the hidden Markov model; and determining which of the linguistic expressions corresponding to the hidden Markov models most closely matches the input speech utterance based on the adjusted constrained acoustic scores, thereby recognizing the input speech utterance. - View Dependent Claims (8, 9, 10, 17, 18)
-
-
11. A computer system for recognizing an input speech utterance that includes a sequence of input speech frames, the system comprising:
-
a hidden Markov model storage unit storing a plurality of context-dependent hidden Markov models, each hidden Markov model corresponding to a linguistic expression; a context-independent acoustic model storage unit storing a plurality of context-independent acoustic models; means for comparing the sequence of input speech frames to the hidden Markov models, including means for performing an output probability analysis and means for performing a transition probability analysis of the sequence of input speech frames with respect to each of the hidden Markov models and resulting in a constrained acoustic scored for each of the hidden Markov models; and means for comparing each input speech frame to the context-independent acoustic models, for computing an incremental unconstrained acoustic score for each input speech frame that reflects a probability that the input frame matches whichever context-independent acoustic model most closely matches the input speech frame, and for accumulating the incremental unconstrained acoustic scores into a unconstrained acoustic score for the sequence of input speech frames; means for computing a confidence measure for each hidden Markov model based on a difference between the constrained acoustic score for the hidden Markov model and the unconstrained acoustic score; means for adjusting the constrained acoustic score for each hidden Markov model based on the confidence measure computed for the hidden Markov model; and means for determining which of the linguistic expressions corresponding to the hidden Markov models most closely matches the input speech utterance based on the adjusted constrained acoustic scores, thereby recognizing the input speech utterance. - View Dependent Claims (12, 13)
-
-
19. A computer-implemented method of recognizing an input speech utterance, comprising:
-
receiving the input speech utterance; for each of a plurality of stored acoustic utterance models, computing a context-based constrained acoustic score based on context information of the stored acoustic utterance model, each stored acoustic utterance model corresponding to a linguistic expression; for each stored acoustic utterance model, computing a confidence measure based on the constrained acoustic score for the stored acoustic utterance model, the confidence measure reflecting a probability that the constrained acoustic score correctly reflects how accurately the stored acoustic utterance model matches the input speech utterance; for each stored acoustic utterance model, computing a combined score based on the constrained acoustic score and the confidence measure computed for the stored acoustic utterance model; and determining which of the linguistic expressions corresponding to the stored acoustic utterance models most closely matches the input speech utterance based on the combined scores, thereby recognizing the input speech utterance. - View Dependent Claims (20)
-
-
21. A computer-implemented method of recognizing an input speech utterance, comprising:
-
receiving the input speech utterance; for each of a plurality of stored acoustic utterance models, computing a context-based constrained acoustic score based on context information of the stored acoustic utterance model, each stored acoustic utterance model corresponding to a linguistic expression; computing an unconstrained acoustic score for the input speech utterance without regard for the context information of the stored acoustic utterance models; for each stored acoustic utterance model, computing a confidence measure based on the unconstrained acoustic score and the constrained acoustic score for the stored acoustic utterance model, the confidence measure reflecting a probability that the constrained acoustic score correctly reflects how accurately the stored acoustic utterance model matches the input speech utterance; for each stored acoustic utterance model, adjusting the constrained acoustic score for the stored acoustic utterance model based on the confidence measure determined for the stored acoustic utterance model; and determining which of the linguistic expressions corresponding to the stored acoustic utterance models most closely matches the input speech utterance based on the constrained acoustic scores, thereby recognizing the input speech utterance. - View Dependent Claims (22)
-
-
23. A computer storage medium for controlling a computer to recognize an input speech utterance received by the computer, the storage medium comprising:
-
a plurality of acoustic utterance models, each acoustic utterance model corresponding to a linguistic expression; computer instructions for comparing the input speech utterance with each of the acoustic utterance models to obtain a constrained acoustic score for each of the acoustic utterance models; computer instructions that, for each of the acoustic utterance models, causes the computer to compute a confidence measure based on the constrained acoustic score for the stored acoustic utterance model, the confidence measure reflecting a probability that the constrained acoustic score correctly reflects how accurately the acoustic utterance model matches the input speech utterance; computer instructions that, for each of the acoustic utterance models, causes the computer to compute a combined score based on the constrained acoustic score and the confidence measure computed for the acoustic utterance model; and computer instructions for determining which of the linguistic expressions corresponding to the acoustic utterance models most closely matches the input speech utterance based on the combined scores, thereby recognizing the input speech utterance. - View Dependent Claims (24, 25, 26, 27, 28, 29, 30)
-
Specification