PRONUNCIATION LEARNING THROUGH CORRECTION LOGS
First Claim
1. A method for dynamically learning new pronunciations for speech recognition assisted by subsequent user inputs, the method comprising:
- generating hypothetical pronunciations for a misrecognized word in spoken utterances based on a predicted intended word derived from corresponding subsequent user inputs;
recognizing misrecognized spoken utterances using a language model containing the hypothetical pronunciations to find matching hypothetical pronunciations; and
accepting a new pronunciation for the predicted intended word from the matching hypothetical pronunciations.
3 Assignments
0 Petitions
Accused Products
Abstract
A new pronunciation learning system for dynamically learning new pronunciations assisted by user correction logs. The user correction logs provide a record of speech recognition events and subsequent user behavior that implicitly confirms or rejects the recognition result and/or shows the user'"'"'s intended words by via subsequent input. The system analyzes the correction logs and distills them down to a set of words which lack acceptable pronunciations. Hypothetical pronunciations, constrained by spelling and other linguistic knowledge, are generated for each of the words. Offline recognition determines the hypothetical pronunciations with a good acoustical match to the audio data likely to contain the words. The matching pronunciations are aggregated and adjudicated to select new pronunciations for the words to improve general or personalized recognition models.
-
Citations
20 Claims
-
1. A method for dynamically learning new pronunciations for speech recognition assisted by subsequent user inputs, the method comprising:
-
generating hypothetical pronunciations for a misrecognized word in spoken utterances based on a predicted intended word derived from corresponding subsequent user inputs; recognizing misrecognized spoken utterances using a language model containing the hypothetical pronunciations to find matching hypothetical pronunciations; and accepting a new pronunciation for the predicted intended word from the matching hypothetical pronunciations. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A speech recognition system for assisted dynamic learning of new pronunciations for user input, the speech recognition system comprising:
-
a recognition event store storing recognition event data for tasks initiated by spoken utterances, the recognition event data including audio data of the spoken utterances, the recognition results obtained by decoding the spoken utterances, subsequent user inputs, and indicators of whether outcomes of the recognition results and the subsequent user inputs were accepted or rejected by users; an event classifier operable to classify recognition results as misrecognized spoken utterances based on an indication that outcomes of recognition results were not accepted while outcomes of subsequent user inputs were successful and a determination that a subsequent user input and recognition result pair from a single source have significant similarity, and operable to identify misrecognized portions of the recognition results from the subsequent user inputs; a pronunciation generator operable to generate hypothetical pronunciations for the misrecognized portions using the corresponding portions of the subsequent user inputs; a speech recognizer operable to match hypothetical pronunciations with the audio data of spoken utterances that produced recognition results classified as misrecognized spoken utterances; and an aggregation adjudicator operable to select new pronunciations for the misrecognized words from the matching pronunciations. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A computer readable medium containing computer executable instructions which, when executed by a computer, perform a method for assisted dynamic learning of new pronunciations for user input, the method comprising:
-
determining that a spoken utterance is likely to have been misrecognized based on implicit indications that a voice input based on the spoken utterance was unsuccessful and a subsequent user input similar to the voice input was successful; identifying a difference in linguistic unit between the voice input and the subsequent user input; generating hypothetical pronunciations for misrecognized word in spoken utterances based on a similar word from corresponding subsequent user inputs; recognizing misrecognized spoken utterances using a language model containing the hypothetical pronunciations to find matching hypothetical pronunciations; and accepting a new pronunciation for the similar word from the matching hypothetical pronunciations. - View Dependent Claims (19, 20)
-
Specification