Training and using pronunciation guessers in speech recognition
First Claim
1. A method of training acoustic models for use in phonetically spelled word models comprising:
- using a training pronunciation guesser to generate a phonetic spelling, each including a sequence of phonemes, from the text spelling of each of a set of acoustic training words;
mapping sequences of sound associated with utterances from each of multiple speakers of each of a plurality of the training words against the corresponding sequence of phonemes defined by the phonetic spelling associated with the training word by the pronunciation guesser; and
for each of a plurality of said phonemes, using the sounds of the utterances from multiple speakers mapped against a given phoneme in one or more of said phonetic spellings to develop at least one multi-speaker acoustic phoneme model for the given phoneme;
further including using the multi-speaker acoustic phoneme models, or acoustic models derived from them, in speech recognition performed against acoustic word models of words, where the acoustic word model of a given word is composed of a sequence of the acoustic phoneme models corresponding to a phonetic spelling generated for the given word by a recognition pronunciation guesser; and
wherein the recognition pronunciation guesser is sufficiently similar to the training pronunciation guesser that it would make a majority of the same phonetic spelling errors made by the training pronunciation guesser in the acoustic training words if it were to generate phonetic spellings for the set of acoustic training words.
8 Assignments
0 Petitions
Accused Products
Abstract
The error rate of a pronunciation guesser that guesses the phonetic spelling of words used in speech recognition is improved by causing its training to weigh letter-to-phoneme mappings used as data in such training as a function of the frequency of the words in which such mappings occur. Preferably the ratio of the weight to word frequency increases as word frequencies decreases. Acoustic phoneme models for use in speech recognition with phonetic spellings generated by a pronunciation guesser that makes errors are trained against word models whose phonetic spellings have been generated by a pronunciation guesser that makes similar errors. As a result, the acoustic models represent blends of phoneme sounds that reflect the spelling errors made by the pronunciation guessers. Speech recognition enabled systems are made by storing in them both a pronunciation guesser and a corresponding set of such blended acoustic models.
413 Citations
50 Claims
-
1. A method of training acoustic models for use in phonetically spelled word models comprising:
-
using a training pronunciation guesser to generate a phonetic spelling, each including a sequence of phonemes, from the text spelling of each of a set of acoustic training words; mapping sequences of sound associated with utterances from each of multiple speakers of each of a plurality of the training words against the corresponding sequence of phonemes defined by the phonetic spelling associated with the training word by the pronunciation guesser; and for each of a plurality of said phonemes, using the sounds of the utterances from multiple speakers mapped against a given phoneme in one or more of said phonetic spellings to develop at least one multi-speaker acoustic phoneme model for the given phoneme; further including using the multi-speaker acoustic phoneme models, or acoustic models derived from them, in speech recognition performed against acoustic word models of words, where the acoustic word model of a given word is composed of a sequence of the acoustic phoneme models corresponding to a phonetic spelling generated for the given word by a recognition pronunciation guesser; and wherein the recognition pronunciation guesser is sufficiently similar to the training pronunciation guesser that it would make a majority of the same phonetic spelling errors made by the training pronunciation guesser in the acoustic training words if it were to generate phonetic spellings for the set of acoustic training words. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 14, 15, 16)
-
-
9. A method of training acoustic models for use in phonetically spelled word models comprising:
-
using a training pronunciation guesser to generate a phonetic spelling, each including a sequence of phonemes, from the text spelling of each of a set of acoustic training words; mapping sequences of sound associated with utterances of each of the training words against the corresponding sequence of phonemes defined by the phonetic spelling associated with the training word by the pronunciation guesser; and for each of a plurality of said phonemes, using the sounds mapped against a given phoneme in one or more of said phonetic spellings to develop at least one acoustic phoneme model for the given phoneme; wherein 5% or more of the occurrences of vowel phonemes placed in the phonetic spellings of the acoustic training words by the training pronunciation guesser are phonetic spelling errors further including using the acoustic phoneme models in speech recognition performed against acoustic word models of words, where the acoustic word model of a given word is composed of a sequence of the acoustic phoneme models corresponding to a phonetic spelling generated for the given word by a recognition pronunciation guesser; and wherein the recognition pronunciation guesser would make a majority of the same phonetic spelling errors made by the training pronunciation guesser in the acoustic training words if it were to generate phonetic spellings for the set of acoustic training words; wherein the words whose guessed phonetic spellings are used in the speech recognition are peoples'"'"' names; wherein the speech recognition is used in telephone name dialing in which the speech recognition of a name is used to select a telephone number associated with that name that can be automatically dialed; and wherein the speech recognition and name dialing are performed on a cellphone; and further including; responding to the entry of a name by a user by having the recognition pronunciation guesser generate a phonetic spelling for the user-entered name; and using the phonetic spelling of the user-entered name in the speech recognition; and for each of a plurality of common names, testing if the phonetic spelling produced for the name by the recognition pronunciation guesser is correct; and for each of a plurality of said common names which are found not to have correct phonetic spellings generated for them by the recognition pronunciation guesser, storing on said cellphone a phonetic spelling of the name that comes from a source more accurate than the recognition pronunciation guesser; and wherein said responding to the entry of a name by a user includes; checking to see if the name is one for which a phonetic spelling from the more accurate source has been stored; if so, using the more accurate spelling as the phonetic spelling for the user entered word in speech recognition; and if not, using the recognition pronunciation guesser to generate the phonetic spelling of the word and using that generated spelling in speech recognition.
-
-
17. A method of making a speech recognition enabled computing system comprising:
-
training a set of acoustic phoneme models by; using a training pronunciation guesser to generate a phonetic spelling, each including a sequence of phonemes, from the text spelling of each of a set of acoustic training words; mapping sequences of sound from utterances of multiple of speakers against the sequence of phonemes defined by the phonetic spelling associated with training words by the pronunciation guesser; and for each of a plurality of said phonemes, using the sounds of the utterances from multiple speakers mapped against a given phoneme in one or more of said phonetic spellings to develop at least one multi-speaker acoustic phoneme model for the given phoneme; and storing in machine readable memory of the computing system being made the following; recognition pronunciation guessing programming for generating a phonetic spelling, comprised of a sequence of phonemes, from a textual spelling of a word; at least acoustic phoneme model for modeling the speech sounds associated with each phoneme used in the phonetic spellings generated by the recognition pronunciation guessing programming, including said multi-speaker acoustic phoneme models, or acoustic models derived from them; speech recognition programming for recognizing an utterance by scoring the match between a sequence of the utterance'"'"'s speech sounds and a sequence of said acoustic phoneme models associated with the phonetic spelling of each of a plurality of words; and programming for enabling the speech recognition programming to perform recognition against a sequence of said acoustic phoneme models associated with a phonetic spelling generated by the pronunciation guessing programming; wherein; 5% or more of the occurrences of vowel phonemes placed in the phonetic spellings of the acoustic training words by the training pronunciation guesser are phonetic spelling errors; and the recognition pronunciation guessing programming would make a majority of the same phonetic spelling errors as are made by the training pronunciation guesser when generating phonetic spellings for the acoustic training words. - View Dependent Claims (18, 19, 20, 21, 22, 23)
-
-
24. A speech recognition system comprising:
-
machine readable memory storing; pronunciation guessing programming for generating a phonetic spelling, comprised of a sequence of phonemes, from a textual spelling of a word; a set of acoustic phoneme models, including at least one for modeling the speech sounds associated with each phoneme used in the phonetic spellings generated by the pronunciation guessing programming, where each of a plurality of said acoustic phoneme models are multi-speaker models that each have been derived from utterances made by multiple speaker, or acoustic models that have been adapted from such multi-speaker models; speech recognition programming for recognizing an utterance by scoring the match between a sequence of the utterance'"'"'s speech sounds and a sequence of said acoustic phoneme models associated with the phonetic spelling of each of a plurality of word models; and programming for enabling the speech recognition programming to perform recognition against phonetic spellings generated by the pronunciation guessing programming; wherein; each of said acoustic models represents a phoneme in phonetic context; each of a plurality of said acoustic models is a blended acoustic model that represents a given phoneme in a given phonetic context as a distribution of sounds corresponding to utterances of the given phoneme and utterances of an associated set of one or more other phonemes, where both the sounds corresponding to the utterances of the given phoneme and to utterances of one or more associated phonemes have each been derived from the utterances of multiple speakers; and over the plurality of blended acoustic models, the relative weight allocated, in a given acoustic model representing a given phoneme in a given phonetic context, between sounds of utterances of the given phoneme and sounds of utterances of a specific one of the given phoneme'"'"'s associated set of phonemes is correlated with the frequency with which the pronunciation guessing programming places the given phoneme in a position in a phonetic spelling in the given phonetic context where the correct phoneme for the position is said specific associated phoneme. - View Dependent Claims (25, 26, 27, 28)
-
-
29. A speech recognition system comprising:
-
a pronunciation guesser for generating a phonetic spelling, comprised of a sequence of phonemes, from a textual spelling of a word; machine readable memory storing a set of acoustic phoneme models, including at least one for modeling the speech sounds associated with each phoneme used in the phonetic spellings generated by the pronunciation guesser, where each of a plurality of said acoustic phoneme models are multi-speaker models that each have been derived from utterances made by multiple speaker, or acoustic models that have been adapted from such multi-speaker models; a speech recognizer for recognizing an utterance by scoring the match between a sequence of the utterance'"'"'s speech sounds and a sequence of said acoustic phoneme models associated with the phonetic spelling of each of a plurality of word models; and circuitry for enabling the speech recognizer to perform recognition against phonetic spellings generated by the pronunciation guesser; wherein; each of said acoustic models represents a phoneme in a phonetic context; each of a plurality of said acoustic models is a blended acoustic model that represents a given phoneme in a given phonetic context as a distribution of sounds corresponding to utterances of the given phoneme and utterances of an associated set of one or more other phonemes, where both the sounds corresponding to the utterances of the given phoneme and to utterances of one or more associated phonemes have each been derived from the utterances of multiple speakers; and over the plurality of blended acoustic models, the relative weight allocated, in a given acoustic model representing a given phoneme in a given phonetic context, between sounds of utterances of the given phoneme and sounds of utterances of a specific one of the given phoneme'"'"'s associated set of phonemes is correlated with the frequency with which the pronunciation guesser places the given phoneme in a position in a phonetic spelling in the given phonetic context where the correct phoneme for the position is said specific associated phoneme. - View Dependent Claims (30, 31, 32, 33, 34)
-
-
35. A speech recognition system comprising:
-
machine readable memory storing; pronunciation guessing programming for generating a phonetic spelling, comprised of a sequence of phonemes, from a textual spelling of a word; a set of acoustic phoneme models, including at least one for modeling the speech sounds associated with each phoneme used in the phonetic spellings generated by the pronunciation guessing programming, where each of a plurality of said acoustic phoneme models are multi-speaker models that each have been derived from utterances made by multiple speaker, or acoustic models that have been adapted from such multi-speaker models; speech recognition programming for recognizing an utterance by scoring the match between a sequence of the utterance'"'"'s speech sounds and a sequence of said acoustic phoneme models associated with the phonetic spelling of each of a plurality of word models; and programming for enabling the speech recognition programming to perform recognition against phonetic spellings generated by the pronunciation guessing programming; wherein; the pronunciation guessing programming would produce phonetic spellings in which 5% or more of the individual occurrences of vowel phonemes are phonetic misspellings when generating the phonetic spellings of a given vocabulary for which the pronunciation guesser has been trained to generated phonetic spellings; each of said acoustic models represents a phoneme in a phonetic context; each of a plurality of said acoustic models, including at least one acoustic model for at least a plurality of vowel phonemes used by the pronunciation guessing programming, is a blended acoustic model that represents a given phoneme in a given phonetic context as a distribution of sounds corresponding to utterances of the given phoneme and utterances of an associated set of one or more other phonemes, where both the sounds corresponding to the utterances of the given phoneme and to utterances of one or more associated phonemes have each been derived from the utterances of multiple speakers; and over the plurality of blended acoustic models, the relative weight allocated, in a given acoustic model representing a given phoneme in a given phonetic context, between sounds of utterances of the given phoneme and sounds of utterances of a specific one of the given phoneme'"'"'s associated set of phonemes is correlated with the frequency with which the pronunciation guessing programming would place, when generating phonetic spelling for the given vocabulary, the given phoneme in a position in a phonetic spelling within the given phonetic context where the correct phoneme for the position is said specific associated phoneme. - View Dependent Claims (36, 37, 38, 39, 40, 41, 44, 45, 46, 47, 48, 49, 50)
-
-
42. A system comprising:
-
machine readable memory storing; pronunciation guessing programming for generating a phonetic spelling, comprised of a sequence of phonemes, from a textual spelling of a word; a set of acoustic phoneme models, including at least one for modeling the speech sounds associated with each phoneme used in the phonetic spellings generated by the pronunciation guessing programming; speech recognition programming for recognizing an utterance by scoring the match between a sequence of the utterance'"'"'s speech sounds and a sequence of said acoustic phoneme models associated with the phonetic spelling of each of a plurality of word models; and programming for enabling the speech recognition programming to perform recognition against phonetic spellings generated by the pronunciation guessing programming; wherein; the pronunciation guessing programming would produce phonetic spellings in which 5% or more of the individual occurrences of vowel phonemes are phonetic misspellings when generating the phonetic spellings of a given vocabulary for which the pronunciation guesser has been trained to generated phonetic spellings; each of said acoustic models represents a phoneme in a phonetic context; each of a plurality of said acoustic models, including at least one acoustic model for at least a plurality of vowel phonemes used by the pronunciation guessing programming, is a blended acoustic model that represents a given phoneme in a given phonetic context as a distribution of sounds corresponding to utterances of the given phoneme and utterances of an associated set of one or more other phonemes; and over the plurality of blended acoustic models, the relative weight allocated, in a given acoustic model representing a given phoneme in a given phonetic context, between sounds of utterances of the given phoneme and each of the given phoneme'"'"'s associated phonemes is correlated with the frequency with which the pronunciation guessing programming would place, when generating phonetic spelling for the given vocabulary, the given phoneme in a position in a phonetic spelling within the given phonetic context where the correct phoneme for the position is, respectively, the given phoneme and each of said associated phonemes; wherein said machine readable memory further stores programming for; enabling a user to enter the text spelling of a name into the system in association with an item upon which the system can perform a given function; responding to such a user'"'"'s entry of a name into the system by causing the pronunciation guessing programming to generate a phonetic spelling from the text spelling of the entered name; responding to a user'"'"'s utterance by having the speech recognition programming score the match between the sound of the utterance and sequences of said acoustic phoneme models corresponding to the phonetic spellings generated by the pronunciation guessing programming for each of one or more user entered names; and determining whether to perform the given function on the item associated with a given user-entered name as a function of the score produced by the speech recognition programming for the utterance against the given user-entered name; and wherein;
—
said machine readable memory further stores correct phonetic spellings for a plurality of names the pronunciation guessing programming phonetically misspell; andsaid responding to a user'"'"'s entry of a name into the system responds to the user'"'"'s entry of a given name for which a correct phonetic spelling has been stored by causing said correct phonetic spelling to be used as the phonetic spelling for the given user-entered name in the matching performed by the speech recognition programming. - View Dependent Claims (43)
-
Specification