Rejection method for speech recognition
First Claim
1. A method for speech recognition comprising the steps of:
- representing an unknown utterance as a first sequence of parameter frames, each parameter frame including a set of primary and secondary parameters and an equalized second sequence of parameter frames derived from the first sequence of parameter frames;
comparing each of the primary and secondary parameters in the sequence of parameter frames of the representation of the unknown utterance to each of a plurality of reference representations expressed in the same kind of parameters, to determine how closely each reference representation resembles the representation of the unknown utterance;
ranking the reference representations in order from best to worst choice in dependence upon their relative closeness to the representation of the unknown utterance, for each of the first and second sequences of parameters;
computing a probability that the best choice is a correct match for the unknown utterance; and
rejecting the best choice as a match for the unknown utterance if the probability is below a predetermined value.
7 Assignments
0 Petitions
Accused Products
Abstract
A speech recognizer, for recognizing unknown utterances in isolated-word small-vocabulary speech has improved rejection of out of vocabulary utterances. Both a usual spectral representation including a dynamic component and an equalized representation are used to match unknown utterances to templates for in-vocabulary words. In a preferred embodiment, the representations are mel-based cepstral with dynamic components being signed vector differences between pairs of primary cepstra. The equalized representation being the signed difference of each cepstral coefficient less an average value of the coefficients. Factors are generated from the ordered lists of templates to determine the probability of the top choice being a correct acceptance, with different methods being applied when the usual and equalized representations yield a different match. For additional enhancement, the rejection method may use templates corresponding to non-vocabulary utterances or decoys. If the top choice corresponds to a decoy, the input is rejected.
-
Citations
19 Claims
-
1. A method for speech recognition comprising the steps of:
-
representing an unknown utterance as a first sequence of parameter frames, each parameter frame including a set of primary and secondary parameters and an equalized second sequence of parameter frames derived from the first sequence of parameter frames; comparing each of the primary and secondary parameters in the sequence of parameter frames of the representation of the unknown utterance to each of a plurality of reference representations expressed in the same kind of parameters, to determine how closely each reference representation resembles the representation of the unknown utterance; ranking the reference representations in order from best to worst choice in dependence upon their relative closeness to the representation of the unknown utterance, for each of the first and second sequences of parameters; computing a probability that the best choice is a correct match for the unknown utterance; and rejecting the best choice as a match for the unknown utterance if the probability is below a predetermined value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. Apparatus for speech recognition, comprising:
-
means for representing an unknown utterance as a first sequence of parameter frames, each parameter frame including a set of primary and secondary parameters and an equalized second sequence of parameter frames derived from the first sequence of parameter frames; means for comparing each of the primary and secondary parameters in the sequence of parameter frames of the representation of the unknown utterance to each of a plurality of reference representations expressed in the same kind of parameters, to determine how closely each reference representation resembles the representation of the unknown utterance; means for ranking the reference representations in order from best to worst choice in dependence upon their relative closeness to the representation of the unknown utterance, for each of the first and second sequences of parameters; means for computing a probability that the best choice is a correct match for the unknown utterance; and means for rejecting the best choice as a match for the unknown utterance if the probability is below a predetermined value. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
-
Specification