System and Method for Learning Alternate Pronunciations for Speech Recognition
First Claim
1. A method for generating candidate pronunciations for a selected word for learning alternative pronunciations of the word in a given language utilizing a grammar-based recognizer in a speech recognition system, the method comprising the steps of:
- a. training an acoustic model for use by the grammar-based recognizer on a large speech corpus to distinguish phonemes;
b. constructing a phoneme confusion matrix for application by the grammar-based recognizer to find similar phonemes of mispronounced phonemes in the selected word;
c. constructing a phoneme replacement candidate list of the selected word for each phoneme in a set of speech data containing pronunciations for recognition, using the phoneme confusion matrix;
d. learning, by the grammar-based recognizer, candidate pronunciations of the word, using input from the acoustic model;
e. combining said learned candidate pronunciations with a linguistic dictionary to create a pooled dictionary; and
f. pruning said pooled dictionary to limit the number of learned candidate pronunciations in order to create an improved dictionary.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for learning alternate pronunciations for speech recognition is disclosed. Alternative name pronunciations may be covered, through pronunciation learning, that have not been previously covered in a general pronunciation dictionary. In an embodiment, the detection of phone-level and syllable-level mispronunciations in words and sentences may be based on acoustic models trained by Hidden Markov Models. Mispronunciations may be detected by comparing the likelihood of the potential state of the targeting pronunciation unit with a pre-determined threshold through a series of tests. It is also within the scope of an embodiment to detect accents.
-
Citations
29 Claims
-
1. A method for generating candidate pronunciations for a selected word for learning alternative pronunciations of the word in a given language utilizing a grammar-based recognizer in a speech recognition system, the method comprising the steps of:
-
a. training an acoustic model for use by the grammar-based recognizer on a large speech corpus to distinguish phonemes; b. constructing a phoneme confusion matrix for application by the grammar-based recognizer to find similar phonemes of mispronounced phonemes in the selected word; c. constructing a phoneme replacement candidate list of the selected word for each phoneme in a set of speech data containing pronunciations for recognition, using the phoneme confusion matrix; d. learning, by the grammar-based recognizer, candidate pronunciations of the word, using input from the acoustic model; e. combining said learned candidate pronunciations with a linguistic dictionary to create a pooled dictionary; and f. pruning said pooled dictionary to limit the number of learned candidate pronunciations in order to create an improved dictionary. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A method for learning alternative pronunciations for a selected word instance in a given language in a speech recognition system, wherein the speech recognition system comprises at least a grammar-based recognizer, the method comprising the steps of:
-
a. performing a first test, by the grammar-based recognizer, on the selected word instance to determine a baseline recognition result; b. performing, by the grammar-based recognizer, hierarchical pronunciation learning on the word instance and selecting a pronunciation that is similar to the word instance; and c. performing, by the grammar-based recognizer, an other test to determine if the selected pronunciation is recognized as the word instance wherein if the word is recognized, adding the selected pronunciation to a dictionary, otherwise, discarding the selected pronunciation. - View Dependent Claims (24, 25, 26, 27)
-
-
28. A system for language learning of mispronunciation detection for a word in a speech recognition system comprising:
-
a. a lexicon builder, wherein the lexicon builder is capable of integrating one or more of;
pronunciation dictionaries, spelling-to-pronunciation interpretations, and text normalizations;b. a speech corpus comprising audio data of pronunciations of the word for recognition; c. an acoustic model for recognizing pronunciations of the word as phoneme sequences; d. a word lexicon, wherein the word lexicon provides reference pronunciations of the word; e. a word grammar, wherein the word grammar specifies words for recognition; f. a grammar-based recognizer which provides a hypothesized word to a means for scoring, based on input from;
the speech corpus, the acoustic model, the word lexicon, and the word grammar; andg. a means for scoring which indicates accuracy of the hypothesized word from the grammar-based recognizer, wherein the means for scoring utilizes input from the speech corpus in indicating accuracy. - View Dependent Claims (29)
-
Specification