Utterance verification method and apparatus for isolated word N-best recognition result
First Claim
1. An utterance verification method for an isolated word N-best speech recognition result, comprising:
- calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model based on an N-best speech recognition result for an input utterance;
measuring a confidence score of an N-best speech-recognized word using the log likelihoods;
calculating a distance between phonemes for the N-best speech-recognized word;
comparing the confidence score with a threshold and the distance with a mean of distances; and
accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptance,wherein the log likelihood of the context-dependent phoneme is calculated by;
log likelihood of context-dependent phoneme={(log likelihood of current phoneme)−
(mean of base phonemes of current phoneme)}/(standard deviation of base phonemes of current phoneme).
1 Assignment
0 Petitions
Accused Products
Abstract
An utterance verification method for an isolated word N-best speech recognition result includes: calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model based on an N-best speech recognition result for an input utterance; measuring a confidence score of an N-best speech-recognized word using the log likelihoods; calculating distance between phonemes for the N-best speech-recognized word; comparing the confidence score with a threshold and the distance with a predetermined mean of distances; and accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptance.
31 Citations
16 Claims
-
1. An utterance verification method for an isolated word N-best speech recognition result, comprising:
-
calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model based on an N-best speech recognition result for an input utterance; measuring a confidence score of an N-best speech-recognized word using the log likelihoods; calculating a distance between phonemes for the N-best speech-recognized word; comparing the confidence score with a threshold and the distance with a mean of distances; and accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptance, wherein the log likelihood of the context-dependent phoneme is calculated by;
log likelihood of context-dependent phoneme={(log likelihood of current phoneme)−
(mean of base phonemes of current phoneme)}/(standard deviation of base phonemes of current phoneme). - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An utterance verification method for an isolated word N-best speech recognition result, comprising:
-
calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model based on an N-best speech recognition result for an input utterance; measuring a confidence score of an N-best speech-recognized word using the log likelihoods; calculating a distance between phonemes for the N-best speech-recognized word; comparing the confidence score with a threshold and the distance with a mean of distances; and accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptance, wherein the log likelihood of the anti-phoneme model is calculated by;
log likelihood of anti-phoneme model={(log likelihood of anti-phoneme model of current phoneme)−
(mean of anti-phoneme model of current phoneme)}/(standard deviation of anti-phoneme model of current phoneme).
-
-
9. An utterance verification system for an isolated word N-best speech recognition result comprising:
-
a computer comprising; a pre-processor extracting a feature vector of an input utterance and performing endpoint detection; an N-best speech recognizer performing N-best speech recognition through Viterbi search by referring to the context-dependent phoneme model extracted from the feature vector; and an N-best utterance verifier calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model for the N-best speech-recognized word, comparing a confidence score measured for the N-best speech-recognized word with a threshold, comparing a distance measured for the N-best speech-recognized-word with a mean of distances, and accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptances wherein the likelihood of the context-dependent phonemes is calculated by;
log likelihood of context-dependent phoneme={(log likelihood of current phoneme)−
(mean of base phonemes of current phoneme)}/(standard deviation of base phonemes of current phoneme). - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. An utterance verification system for an isolated word N-best speech recognition result comprising:
-
a computer comprising; a pre-processor extracting a feature vector of an input utterance and performing endpoint detection; an N-best speech recognizer performing N-best speech recognition through Viterbi search by referring to the context-dependent phoneme model extracted from the feature vector; and an N-best utterance verifier calculating log likelihoods of a context-dependent phoneme and an anti-phoneme model for the N-best speech-recognized word, comparing a confidence score measured for the N-best speech-recognized word with a threshold, comparing a distance measured for the N-best speech-recognized-word with a mean of distances, and accepting the N-best speech-recognized word when the compared results for the confidence score and the distance correspond to acceptances wherein the likelihood of the anti-phoneme model is calculated by;
likelihood of anti-phoneme model={(log likelihood of anti-phoneme model of current phoneme)−
(mean of anti-phoneme model of current phoneme)}/(standard deviation of anti-phoneme model of current phoneme).
-
Specification