Method and apparatus for constructing and using syllable-like unit language models
First Claim
1. A speech recognition system having a language model generated through a process comprising:
- a processor breaking each word in a dictionary into units wherein breaking each word into units comprises;
breaking each word in the dictionary into initial units by dividing each word into the largest units possible that each include at most one vowel sound;
for each initial unit, setting a frequency for the initial unit by summing the unigram probabilities of the words in which the initial unit was identified;
breaking at least one of the initial units into smaller units by preferring smaller units that occur more frequently in the dictionary over smaller units that occur less frequently and by preferring smaller units that group together sequences of phonetic units that appear in a word and where each sequence of phonetic units comprises phonetic units that the speech recognition system typically fails to recognize individually;
for each word, grouping the smaller units of the word into n-grams;
counting the total number of n-gram occurences in the dictionary; and
for each n-gram, counting the number of occurences of the n-gram in the dictionary and dividing this count by the total number of n-gram occurences to form a language model probability for the n-gram.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and computer-readable medium use syllable-like units (SLUs) to decode a pronunciation into a phonetic description. The syllable-like units are generally larger than a single phoneme but smaller than a word. The present invention provides a means for defining these syllable-like units and for generating a language model based on these syllable-like units that can be used in the decoding process. As SLUs are longer than phonemes, they contain more acoustic contextual clues and better lexical constraints for speech recognition. Thus, the phoneme accuracy produced from SLU recognition is much better than all-phone sequence recognition.
369 Citations
5 Claims
-
1. A speech recognition system having a language model generated through a process comprising:
-
a processor breaking each word in a dictionary into units wherein breaking each word into units comprises; breaking each word in the dictionary into initial units by dividing each word into the largest units possible that each include at most one vowel sound; for each initial unit, setting a frequency for the initial unit by summing the unigram probabilities of the words in which the initial unit was identified; breaking at least one of the initial units into smaller units by preferring smaller units that occur more frequently in the dictionary over smaller units that occur less frequently and by preferring smaller units that group together sequences of phonetic units that appear in a word and where each sequence of phonetic units comprises phonetic units that the speech recognition system typically fails to recognize individually; for each word, grouping the smaller units of the word into n-grams; counting the total number of n-gram occurences in the dictionary; and for each n-gram, counting the number of occurences of the n-gram in the dictionary and dividing this count by the total number of n-gram occurences to form a language model probability for the n-gram. - View Dependent Claims (2, 3)
-
-
4. A method comprising:
-
for each word in a dictionary of words, dividing the word into units to produce a first set of units for each word, wherein for at least one word, dividing the word into units comprises dividing the word into units smaller than the word and wherein dividing the word into units comprises dividing the word into the largest units possible that each include at most one vowel sound; a processor setting a frequency of each unit by summing unigram probabilities of the words in which the unit appears, wherein each unigram probability comprises the probability of the word appearing in a corpus of text; a processor applying a constraint to the units to identify at least one unit , wherein the constraint requires that each unit have fewer than a selected number of phonemes and wherein the unit is identified because it has at least the selected number of phonemes; for at least one word in the dictionary of words, dividing the word into units to form a second set of units by dividing into smaller units at least one unit of the first set of units for the word such that none of the words in the dictionary is divided into units that include the identified unit; transforming the units of each word into a set of n-grams for each word; and forming a language model by determining frequency counts for each n-gram in the sets of n-grams for the words in the dictionary of words. - View Dependent Claims (5)
-
Specification