Language model biasing system
First Claim
1. A computer-implemented method comprising:
- receiving audio data corresponding to a user utterance and context data for the user utterance;
identifying, based on the context data, an initial set of one or more n-grams including one or more n-grams that do not represent speech preceding the user utterance;
generating an expanded set of one or more n-grams based at least on the initial set of n-grams, the expanded set of n-grams comprising one or more n-grams that are different from the n-grams in the initial set of n-grams;
based at least on the expanded set of n-grams, adjusting a language model trained to predict a first set of n-grams to be able to predict an additional n-gram in the expanded set of n-grams;
determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, wherein each speech recognition candidate comprises one or more words;
after determining the one or more speech recognition candidates, adjusting a score for a particular speech recognition candidate based on determining that the particular speech recognition candidate is included in the expanded set of n-grams;
after adjusting the score for the particular speech recognition candidate, determining, a transcription for the user utterance that includes at least one of the one or more speech recognition candidates; and
providing the transcription of the user utterance for output.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus for receiving audio data corresponding to a user utterance and context data, identifying an initial set of one or more n-grams from the context data, generating an expanded set of one or more n-grams based on the initial set of n-grams, adjusting a language model based at least on the expanded set of n-grams, determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, adjusting a score for a particular speech recognition candidate determined to be included in the expanded set of n-grams, determining a transcription of user utterance that includes at least one of the one or more speech recognition candidates, and providing the transcription of the user utterance for output.
212 Citations
20 Claims
-
1. A computer-implemented method comprising:
-
receiving audio data corresponding to a user utterance and context data for the user utterance; identifying, based on the context data, an initial set of one or more n-grams including one or more n-grams that do not represent speech preceding the user utterance; generating an expanded set of one or more n-grams based at least on the initial set of n-grams, the expanded set of n-grams comprising one or more n-grams that are different from the n-grams in the initial set of n-grams; based at least on the expanded set of n-grams, adjusting a language model trained to predict a first set of n-grams to be able to predict an additional n-gram in the expanded set of n-grams; determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, wherein each speech recognition candidate comprises one or more words; after determining the one or more speech recognition candidates, adjusting a score for a particular speech recognition candidate based on determining that the particular speech recognition candidate is included in the expanded set of n-grams; after adjusting the score for the particular speech recognition candidate, determining, a transcription for the user utterance that includes at least one of the one or more speech recognition candidates; and providing the transcription of the user utterance for output. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A system comprising:
one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform operations comprising; receiving audio data corresponding to a user utterance and context data for the user utterance; identifying, based on the context data, an initial set of one or more n-grams including one or more n-grams that do not represent speech preceding the user utterance; generating an expanded set of one or more n-grams based at least on the initial set of n-grams, the expanded set of n-grams comprising one or more n-grams that are different from the n-grams in the initial set of n-grams; based at least on the expanded set of n-grams, adjusting a language model trained to predict a first set of n-grams to be able to predict an additional n-gram in the expanded set of n-grams; determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, wherein each speech recognition candidate comprises one or more words; after determining the one or more speech recognition candidates, adjusting a score for a particular speech recognition candidate based on determining that the particular speech recognition candidate is included in the expanded set of n-grams; after adjusting the score for the particular speech recognition candidate, determining, a transcription for the user utterance that includes at least one of the one or more speech recognition candidates; and providing the transcription of the user utterance for output. - View Dependent Claims (18, 19)
-
20. A non-transitory computer-readable storage device storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform operations comprising:
-
receiving audio data corresponding to a user utterance and context data for the user utterance; identifying an initial set of one or more n-grams from the context data; generating an expanded set of one or more n-grams based at least on the initial set of n-grams, the expanded set of n-grams comprising one or more n-grams that are different from the n-grams in the initial set of n-grams; adjusting a language model based at least on the expanded set of n-grams; determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, wherein each speech recognition candidate comprises one or more words; after determining the one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, undoing the adjustment to the language model; after determining the one or more speech recognition candidates, adjusting a score for a particular speech recognition candidate determined to be included in the expanded set of n-grams; after adjusting the score for the particular speech recognition candidate, determining, a transcription for the user utterance that includes at least one of the one or more speech recognition candidates; and providing the transcription of the user utterance for output.
-
Specification