Speech recognition based on pronunciation modeling
First Claim
Patent Images
1. A method comprising:
- training a language model by approximating large quantities of transcribed speech using a plurality of speaker dependent phonemic transcription based datasets that are based on pronunciation models obtained for each of a plurality of speakers;
incorporating, in the language model, pronunciation probabilities associated with respective unique labels for each different pronunciation of a word, wherein the respective unique labels for most frequent words and most frequent word pairs indicate a special status in the language model;
removing, from a pronunciation dictionary, the pronunciation probabilities;
applying, via a processor, an utterance to a recognizer with the language model to yield a recognition result; and
presenting the recognition result for the utterance.
5 Assignments
0 Petitions
Accused Products
Abstract
A system and method for performing speech recognition is disclosed. The method comprises receiving an utterance, applying the utterance to a recognizer with a language model having pronunciation probabilities associated with unique word identifiers for words given their pronunciations and presenting a recognition result for the utterance. Recognition improvement is found by moving a pronunciation model from a dictionary to the langue model.
38 Citations
18 Claims
-
1. A method comprising:
-
training a language model by approximating large quantities of transcribed speech using a plurality of speaker dependent phonemic transcription based datasets that are based on pronunciation models obtained for each of a plurality of speakers; incorporating, in the language model, pronunciation probabilities associated with respective unique labels for each different pronunciation of a word, wherein the respective unique labels for most frequent words and most frequent word pairs indicate a special status in the language model; removing, from a pronunciation dictionary, the pronunciation probabilities; applying, via a processor, an utterance to a recognizer with the language model to yield a recognition result; and presenting the recognition result for the utterance. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer-readable storage medium having stored therein instructions which, when executed by a processor, cause the processor to perform a method comprising:
-
training a language model by approximating large quantities of transcribed speech using a plurality of speaker dependent phonemic transcription based datasets that are based on pronunciation models obtained for each of a plurality of speakers; incorporating, in the language model, pronunciation probabilities associated with respective unique labels for each different pronunciation of a word, wherein the respective unique labels for most frequent words and most frequent word pairs indicate a special status in the language model; removing, from a pronunciation dictionary, the pronunciation probabilities; applying an utterance to a recognizer with the language model to yield a recognition result; and presenting the recognition result for the utterance. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system comprising:
-
a processor, and a non-transitory computer-readable medium having stored therein instructions which, when executed by the processor, cause the processor to perform a method comprising; training a language model by approximating large quantities of transcribed speech using a plurality of speaker dependent phonemic transcription based datasets that are based on pronunciation models obtained for each of a plurality of speakers; incorporating, in the language model, pronunciation probabilities associated with respective unique labels for each different pronunciation of a word, wherein the respective unique labels for most frequent words and most frequent word pairs indicate a special status in the language model;
removing, from a pronunciation dictionary, the pronunciation probabilities;applying an utterance to a recognizer with the language model to yield a recognition result; and presenting the recognition result for the utterance. - View Dependent Claims (16, 17, 18)
-
Specification