System and method for discriminative pronunciation modeling for voice search
First Claim
1. A method comprising:
- determining a context associated with an utterance received via a microphone that converts audible signals into electrical signals;
determining, via a processor, phoneme possibilities for a unit of speech in the utterance;
assigning weights to each phoneme possibility in the phoneme possibilities, to yield weighted phonemes, wherein the weights are based on a rate of occurrence of the phoneme possibility in utterances associated with the context and a likelihood of classification errors;
receiving additional utterances via the microphone; and
converting the additional utterances into text via a speech recognizer that uses the weighted phonemes.
4 Assignments
0 Petitions
Accused Products
Abstract
Disclosed herein is a method for speech recognition. The method includes receiving speech utterances, assigning a pronunciation weight to each unit of speech in the speech utterances, each respective pronunciation weight being normalized at a unit of speech level to sum to 1, for each received speech utterance, optimizing the pronunciation weight by identifying word and phone alignments and corresponding likelihood scores, and discriminatively adapting the pronunciation weight to minimize classification errors, and recognizing additional received speech utterances using the optimized pronunciation weights. A unit of speech can be a sentence, a word, a context-dependent phone, a context-independent phone, or a syllable. The method can further include discriminatively adapting pronunciation weights based on an objective function. The objective function can be maximum mutual information, maximum likelihood training, minimum classification error training, or other functions known to those of skill in the art.
-
Citations
20 Claims
-
1. A method comprising:
-
determining a context associated with an utterance received via a microphone that converts audible signals into electrical signals; determining, via a processor, phoneme possibilities for a unit of speech in the utterance; assigning weights to each phoneme possibility in the phoneme possibilities, to yield weighted phonemes, wherein the weights are based on a rate of occurrence of the phoneme possibility in utterances associated with the context and a likelihood of classification errors; receiving additional utterances via the microphone; and converting the additional utterances into text via a speech recognizer that uses the weighted phonemes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
-
a processor; and a computer-readable storage medium having instructions stored which, when executed by the processor, cause the processor to perform operations comprising; determining a context associated with an utterance received via a microphone that converts audible signals into electrical signals; determining, via a processor, phoneme possibilities for a unit of speech in the utterance; assigning weights to each phoneme possibility in the phoneme possibilities, to yield weighted phonemes, wherein the weights are based on a rate of occurrence of the phoneme possibility in utterances associated with the context and a likelihood of classification errors; receiving additional utterances via the microphone; and converting the additional utterances into text via a speech recognizer that uses the weighted phonemes. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A non-transitory computer-readable storage device having instructions stored which, when executed by a computing device, cause the computing device to perform operations comprising:
-
determining a context associated with an utterance received via a microphone that converts audible signals into electrical signals; determining, via a processor, phoneme possibilities for a unit of speech in the utterance; assigning weights to each phoneme possibility in the phoneme possibilities, to yield weighted phonemes, wherein the weights are based on a rate of occurrence of the phoneme possibility in utterances associated with the context and a likelihood of classification errors; receiving additional utterances via the microphone; and converting the additional utterances into text via a speech recognizer that uses the weighted phonemes. - View Dependent Claims (17, 18, 19, 20)
-
Specification