Method and system for dynamically adjusted training for speech recognition
First Claim
1. A method in a computer system for dynamically selecting words for training a speech recognition system, the speech recognition system for recognizing a plurality of words, the speech recognition system having indications of phonemes that compose each word, the speech recognition system representing each spoken utterance by a sequence of codewords, the speech recognition system having a model for each phoneme, each model for generating a probability that each possible sequence of codewords corresponds to the modeled phoneme, the method comprising:
- for each codeword, ranking the phonemes according to the probability that the codeword will be spoken as part of the phoneme;
collecting a plurality of spoken utterances for which the corresponding word is known;
for each collected utterance,converting the collected utterance to a sequence of codewords; and
aligning each codeword in the sequence of codewords with a phoneme of the known word to which the collected utterance corresponds based on the phoneme models;
for each phoneme,accumulating the ranks of that phoneme with all codewords with which it is aligned in each of the collected utterances; and
calculating an average rank of the phoneme by dividing the accumulated rank by a total number of codewords that are aligned with that phoneme in the collected utterances;
identifying a phoneme with low average rank; and
selecting words that contain the identified phoneme as words for training the speech recognition system.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and system for dynamically selecting words for training a speech recognition system. The speech recognition system models each phoneme using a hidden Markov model and represents each word as a sequence of phonemes. The training system ranks each phoneme for each frame according to the probability that the corresponding codeword will be spoken as part of the phoneme. The training system collects spoken utterances for which the corresponding word is known. The training system then aligns the codewords of each utterance with the phoneme that it is recognized to be part of. The training system then calculates an average rank for each phoneme using the aligned codewords for the aligned frames. Finally, the training system selects words for training that contain phonemes with a low rank.
-
Citations
116 Claims
-
1. A method in a computer system for dynamically selecting words for training a speech recognition system, the speech recognition system for recognizing a plurality of words, the speech recognition system having indications of phonemes that compose each word, the speech recognition system representing each spoken utterance by a sequence of codewords, the speech recognition system having a model for each phoneme, each model for generating a probability that each possible sequence of codewords corresponds to the modeled phoneme, the method comprising:
-
for each codeword, ranking the phonemes according to the probability that the codeword will be spoken as part of the phoneme; collecting a plurality of spoken utterances for which the corresponding word is known; for each collected utterance, converting the collected utterance to a sequence of codewords; and aligning each codeword in the sequence of codewords with a phoneme of the known word to which the collected utterance corresponds based on the phoneme models; for each phoneme, accumulating the ranks of that phoneme with all codewords with which it is aligned in each of the collected utterances; and calculating an average rank of the phoneme by dividing the accumulated rank by a total number of codewords that are aligned with that phoneme in the collected utterances; identifying a phoneme with low average rank; and selecting words that contain the identified phoneme as words for training the speech recognition system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method in a computer system for dynamically selecting words for training a speech recognition system, the speech recognition system for recognizing a plurality of words, the speech recognition system having indications of phonetic units that compose each word, the speech recognition system representing speech as a sequence of feature vectors, the speech recognition system having model for each phonetic unit, each model for generating a probability that each possible sequence of feature vectors corresponds to the modeled phonetic unit, the method comprising:
-
collecting a plurality of spoken utterances for which the corresponding word is known; for each collected utterance, converting the collected utterance to a sequence of feature vectors; and aligning each feature vector in the sequence of feature vectors with a phonetic unit of the known word to which the collected utterance corresponds based on the models of the phonetic units of the known word; identifying from the feature vectors aligned with each phonetic unit a phonetic unit that is not accurately modeled; and selecting words that contain the identified phonetic unit as words for training the speech recognition system. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A method in a computer system for tutoring a speaker on pronunciation of words, each word being phonetically represented by phonetic units, each spoken utterance being represented by a sequence of feature vectors, each phonetic unit having a model for generating a probability that various sequences of feature vectors corresponds to the modeled phonetic unit, the method comprising:
-
collecting a plurality of spoken utterances from the speaker for which the corresponding word is known; for each collected utterance, converting the collected utterance into a sequence of feature vectors; and aligning each feature vector in the sequence of feature vectors with a phonetic unit of the known word to which the collected utterance corresponds based on the model of the phonetic units of the known word; identifying, from the feature vectors aligned with each phonetic unit, a phonetic unit that is inaccurately spoken by the speaker; and selecting words that contain the identified phonetic units as words for tutoring the speaker. - View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46)
-
-
47. A method in a computer system for selecting words for training a speech recognition system, the speech recognition system for recognizing a plurality of words, each word being spoken with phonetic units, the method comprising:
-
receiving a plurality of spoken utterances for which the corresponding word is determined; for each phonetic unit of each of the determined words for which a spoken utterance is received, determining a context-dependent accuracy of the speech recognition system at recognizing the phonetic unit within the determined word; for each phonetic unit, determining a context-independent accuracy of the speech recognition system at recognizing the phonetic unit based on the context-dependent accuracies; and selecting words that contain phonetic units that are determined to have a lowest context-independent accuracy for training the speech recognition system. - View Dependent Claims (48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58)
-
-
59. A computer-readable medium containing instructions for causing a computer system to tutor a speaker on pronunciation of words, each word being spoken with phonetic units, by performing the steps of:
-
receiving from the speaker a plurality of spoken utterances for which the corresponding word is known; identifying from the spoken utterances which phonetic units are inaccurately spoken by the speaker; and selecting words that contain an identified phonetic unit for tutoring the speaker. - View Dependent Claims (60, 61, 62, 63)
-
-
64. A computer-readable medium containing instructions for causing a computer system to select words for training a speech recognition system, the speech recognition system for recognizing a plurality of words, the speech recognition system having a indication of phonetic units that compose each word, the speech recognition system having a model for each phonetic unit, the speech recognition system representing speech as a sequence of feature vectors, each model for indicating a probability that each possible sequence of feature vectors corresponds to the modeled phonetic unit, by performing the steps of:
-
receiving a plurality of spoken utterances for which the corresponding word is determined; for each collected utterance, converting the collected utterance to a sequence of feature vectors; and aligning each feature vector in the sequence of feature vectors with a phonetic unit of the determined word to which the collected utterance corresponds based on the model of the phonetic units of the determined word; identifying from the feature vectors aligned with each phonetic unit which phonetic units are least accurately modeled; and selecting words that contain the identified phonetic units as words for training the speech recognition system. - View Dependent Claims (65, 66, 67, 68, 69)
-
-
70. A method in a computer system for dynamically selecting words for training a speech recognition system, each word comprising phonetic units, the method comprising:
-
collecting a plurality of spoken utterances for which the corresponding word is known; identifying from the spoken utterances which phonetic units are inaccurately modeled by the speech recognition system; and selecting words that contain an identified phonetic unit for training of the speech recognition system. - View Dependent Claims (71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97)
-
-
98. A computer system for dynamically selecting words for training a speech recognition system, each word comprising phonetic units, comprising:
-
a sample collecting component that collects a plurality of spoken utterances for which the corresponding word is known and converts the spoken utterances to codewords; an aligning component that aligns codewords with phonetic units of each word; a phonetic unit ranking component that identifies from the spoken utterances which phonetic units are inaccurately modeled by the speech recognition system based on the aligned codewords; and a word selection component that selects words that contain a phonetic unit identified by the phonetic unit ranking component for training of the speech recognition system. - View Dependent Claims (99, 100, 101)
-
-
102. A method in a computer recognition system for evaluating accuracy of the recognition system at recognizing words, each word comprising phonetic units, the method comprising:
-
collecting a plurality of spoken utterances for which the corresponding word is known; and identifying an accuracy of each phonetic unit by aligning frames of the spoken utterances with phonetic units and calculating a frame accuracy measure for each frame based on a probability that the spoken utterance of that frame is contained within the phonetic unit with which it is aligned. - View Dependent Claims (103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114)
-
-
115. A method in a computer recognition system for evaluating accuracy of the recognition system at recognizing words, each word comprising phonetic units, the method comprising:
-
collecting a plurality of spoken utterances for which the corresponding word is known; and identifying an accuracy of each phonetic unit by aligning frames of the spoken utterances with phonetic units and counting the number of times that a phonetic unit was not recognized in a correct word during recognition.
-
-
116. A method in a computer recognition system for evaluating accuracy of the recognition system at recognizing words, each word comprising phonetic units, the method comprising:
-
collecting a plurality of spoken utterances for which the corresponding word is known; and identifying an accuracy of each phonetic unit by aligning frames of the spoken utterances with phonetic units and counting the number of times that a phonetic unit is not recognized in a misrecognized word during recognition.
-
Specification