Automatic language model update
First Claim
1. A method comprising:
- accessing a baseline language model that associates a respective baseline probability of occurrence with each of multiple different terms;
obtaining information related to recent language usage from recent search queries that were submitted by multiple users of a search engine within a predetermined period of time;
determining a quantity of occurrences of a particular term in the recent search queries that were submitted by the multiple users of the search engine within the predetermined period of time;
selectively modifying the baseline language model to independently revise the baseline probability of occurrence associated with the particular term based at least on the quantity of occurrences of the particular term in the recent search queries that were submitted by the multiple users of the search engine within the predetermined period of time while maintaining unchanged a baseline probability of occurrence associated with a different term that does not occur in the recent search queries that were submitted by the multiple users of the search engine within the predetermined period of time by assigning a first probability to the particular term in the recent search queries that were submitted by the multiple users of the search engine within the predetermined period of time that is greater than a second probability for the different term that does not occur in the recent search queries that were submitted by the multiple users of the search engine within the predetermined period of time; and
generating, by an automated speech recognizer using the modified language model, a transcription of one or more utterances of one or more different users of the search engine.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for generating a speech recognition model includes accessing a baseline speech recognition model, obtaining information related to recent language usage from search queries, and modifying the speech recognition model to revise probabilities of a portion of a sound occurrence based on the information. The portion of a sound may include a word. Also, a method for generating a speech recognition model, includes receiving at a search engine from a remote device an audio recording and a transcript that substantially represents at least a portion of the audio recording, synchronizing the transcript with the audio recording, extracting one or more letters from the transcript and extracting the associated pronunciation of the one or more letters from the audio recording, and generating a dictionary entry in a pronunciation dictionary.
87 Citations
17 Claims
-
1. A method comprising:
-
accessing a baseline language model that associates a respective baseline probability of occurrence with each of multiple different terms; obtaining information related to recent language usage from recent search queries that were submitted by multiple users of a search engine within a predetermined period of time; determining a quantity of occurrences of a particular term in the recent search queries that were submitted by the multiple users of the search engine within the predetermined period of time; selectively modifying the baseline language model to independently revise the baseline probability of occurrence associated with the particular term based at least on the quantity of occurrences of the particular term in the recent search queries that were submitted by the multiple users of the search engine within the predetermined period of time while maintaining unchanged a baseline probability of occurrence associated with a different term that does not occur in the recent search queries that were submitted by the multiple users of the search engine within the predetermined period of time by assigning a first probability to the particular term in the recent search queries that were submitted by the multiple users of the search engine within the predetermined period of time that is greater than a second probability for the different term that does not occur in the recent search queries that were submitted by the multiple users of the search engine within the predetermined period of time; and generating, by an automated speech recognizer using the modified language model, a transcription of one or more utterances of one or more different users of the search engine. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A system comprising:
-
a request processor to receive search terms, each of the search terms being terms in a language; an extractor for obtaining information related to recent usage in the language from recent search terms of the search terms, the recent search terms from recent search queries that were submitted by multiple users of a search engine within a predetermined period of time; means for selectively modifying a language model for the language that associates a respective baseline probability of occurrence with each of multiple different terms in the language, to independently revise the baseline probability occurrence associated with a particular term based at least on a quantity of occurrences of the particular term in the recent search queries that were submitted by the multiple users of the search engine within the predetermined period of time while maintaining unchanged a baseline probability of occurrence associated with a different term that does not occur in the recent search queries that were submitted by the multiple users of the search engine within the predetermined period of time by assigning a first probability to the particular term in the recent search queries that were submitted by the multiple users of the search engine within the predetermined period of time that is greater than a second probability for the different term that does not occur in the recent search queries that were submitted by the multiple users of the search engine within the predetermined period of time; and means for generating, by an automated speech recognizer using the modified language model, a transcription of one or more utterances of one or more different users of the search engine. - View Dependent Claims (17)
-
-
15. A computer implemented method comprising:
-
transmitting, from a remote device to a server device, recent search terms from recent search queries that were submitted by multiple users of a search engine within a predetermined period of time, each of the recent search terms being terms in a language, wherein the server device; generates word occurrence data associated with the recent search terms from recent search queries that were submitted by multiple users of the search engine within the predetermined period of time, the word occurrence data including at least a quantity of occurrences of a particular term in the recent search queries that were submitted by the multiple users of the search engine within the predetermined period of time, selectively modifies a baseline language model for the language that associates respective word occurrence data with each of multiple different terms in the language to independently revise the word occurrence data associated with the recent search terms from recent search queries that were submitted by the multiple users of the search engine within the predetermined period of time, while maintaining unchanged respective word occurrence data associated with different terms that do not occur in the recent search queries that were submitted by the multiple users of the search engine within the predetermined period of time by assigning a first probability to a particular term in the recent search queries that were submitted by the multiple users of the search engine within the predetermined period of time that is greater than a second probability for a different term that does not occur in the recent search queries that were submitted by the multiple users of the search engine within the predetermined period of time, and generates, by an automated speech recognizer using the modified language model, a transcription of one or more utterances of one or more different users of the search engine. - View Dependent Claims (16)
-
Specification