Language model biasing modulation
First Claim
1. A computer-implemented method comprising:
- receiving audio data encoding an utterance of a user;
receiving context data associated with the received audio data;
determining a likely context associated with a user, based on at least a portion of the context data;
selecting one or more language model biasing parameters based at least on the likely context associated with the user;
determining a context confidence score associated with the likely context based on at least a portion of the context data, and additional context data indicating (i) that the user has switched between applications, (ii) a time difference between a presentation of a search result and a user response to the presentation of the search result, (iii) gaze tracking data, or (iv) a user behavior in response to visible content;
adjusting one or more of the language model biasing parameters based at least on the context confidence score;
biasing a baseline language model based at least on one or more of the adjusted language model biasing parameters;
providing the biased language model for use by an automated speech recognizer (ASR);
generating a transcription of the received audio data using the biased language model; and
transmitting the generated transcription for display on a client computing device.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for modulating language model biasing. In some implementations, context data is received. A likely context associated with a user is determined based on at least a portion of the context data. One or more language model biasing parameters based at least on the likely context associated with the user is selected. A context confidence score associated with the likely context based on at least a portion of the context data is determined. One or more language model biasing parameters based at least on the context confidence score is adjusted. A baseline language model based at least on the one or more of the adjusted language model biasing parameters is biased. The baseline language model is provided for use by an automated speech recognizer (ASR).
-
Citations
17 Claims
-
1. A computer-implemented method comprising:
-
receiving audio data encoding an utterance of a user; receiving context data associated with the received audio data; determining a likely context associated with a user, based on at least a portion of the context data; selecting one or more language model biasing parameters based at least on the likely context associated with the user; determining a context confidence score associated with the likely context based on at least a portion of the context data, and additional context data indicating (i) that the user has switched between applications, (ii) a time difference between a presentation of a search result and a user response to the presentation of the search result, (iii) gaze tracking data, or (iv) a user behavior in response to visible content; adjusting one or more of the language model biasing parameters based at least on the context confidence score; biasing a baseline language model based at least on one or more of the adjusted language model biasing parameters; providing the biased language model for use by an automated speech recognizer (ASR); generating a transcription of the received audio data using the biased language model; and transmitting the generated transcription for display on a client computing device. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A non-transitory computer storage device encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
receiving audio data encoding an utterance of a user; receiving context data associated with the received audio data; determining a likely context associated with a user, based on at least a portion of the context data; selecting one or more language model biasing parameters based at least on the likely context associated with the user; determining a context confidence score associated with the likely context based on at least a portion of the context data, and additional context data indicating (i) that the user has switched between applications, (ii) a time difference between a presentation of a search result and a user response to the presentation of the search result, (iii) gaze tracking data, or (iv) a user behavior in response to visible content; adjusting one or more of the language model biasing parameters based at least on the context confidence score; biasing a baseline language model based at least on one or more of the adjusted language model biasing parameters; providing the biased language model for use by an automated speech recognizer (ASR); generating a transcription of the received audio data using the biased language model; and transmitting the generated transcription for display on a client computing device. - View Dependent Claims (7, 8, 9)
-
-
10. A system comprising:
-
one or more processors; and a non-transitory computer-readable medium coupled to the one or more computers having instructions stored thereon, which, when executed by the one or more computers, cause the one or more computers to perform operations comprising; receiving audio data encoding an utterance of a user; receiving context data associated with the received audio data; determining a likely context associated with a user, based on at least a portion of the context data; selecting one or more language model biasing parameters based at least on the likely context associated with the user; determining a context confidence score associated with the likely context based on at least a portion of the context data, and additional context data indicating (i) that the user has switched between applications, (ii) a time difference between a presentation of a search result and a user response to the presentation of the search result, (iii) gaze tracking data, or (iv) a user behavior in response to visible content; adjusting one or more of the language model biasing parameters based at least on the context confidence score; biasing a baseline language model based at least on one or more of the adjusted language model biasing parameters; providing the biased language model for use by an automated speech recognizer (ASR); generating a transcription of the received audio data using the biased language model; and transmitting the generated transcription for display on a client computing device. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A computer-implemented method comprising:
-
receiving audio data encoding an utterance of a user; receiving context data associated with the received audio data; determining a likely context associated with a user, based on at least a portion of the context data; selecting one or more language model biasing parameters based at least on the likely context associated with the user; determining a context confidence score associated with the likely context based on at least a portion of the context data; comparing the context confidence score associated with the likely context to a threshold context confidence score value; determining, based at least on comparing the context confidence score associated with the likely context to the threshold context confidence score value, that the likely context data does not remain associated with the user; in response to determining that the likely context data does not remain associated with the user, interpolating one or more selected language model biasing parameters based on at least the likely context; biasing, by a language module biaser, a baseline language model based at least on one or more of the interpolated language model biasing parameters; providing the biased language model for use by an automated speech recognizer (ASR); generating a transcription of the received audio data using the biased language model; and transmitting the generated transcription to a client computing device.
-
-
16. A non-transitory computer storage device encoded with a computer program, the program comprising instructions that when executed by one or more computers cause the one or more computers to perform operations comprising:
-
receiving audio data encoding an utterance of a user; receiving context data associated with the received audio data; determining a likely context associated with a user, based on at least a portion of the context data; selecting one or more language model biasing parameters based at least on the likely context associated with the user; determining a context confidence score associated with the likely context based on at least a portion of the context data; comparing the context confidence score associated with the likely context to a threshold context confidence score value; determining, based at least on comparing the context confidence score associated with the likely context to the threshold context confidence score value, that the likely context data does not remain associated with the user; in response to determining that the likely context data does not remain associated with the user, interpolating one or more selected language model biasing parameters based on at least the likely context; biasing, by a language module biaser, a baseline language model based at least on one or more of the interpolated language model biasing parameters; providing the biased language model for use by an automated speech recognizer (ASR); generating a transcription of the received audio data using the biased language model; and transmitting the generated transcription to a client computing device.
-
-
17. A system comprising:
-
one or more processors; and a non-transitory computer-readable medium coupled to the one or more computers having instructions stored thereon, which, when executed by the one or more computers, cause the one or more computers to perform operations comprising; receiving audio data encoding an utterance of a user; receiving context data associated with the received audio data; determining a likely context associated with a user, based on at least a portion of the context data; selecting one or more language model biasing parameters based at least on the likely context associated with the user; determining a context confidence score associated with the likely context based on at least a portion of the context data; comparing the context confidence score associated with the likely context to a threshold context confidence score value; determining, based at least on comparing the context confidence score associated with the likely context to the threshold context confidence score value, that the likely context data does not remain associated with the user; in response to determining that the likely context data does not remain associated with the user, interpolating one or more selected language model biasing parameters based on at least the likely context; biasing, by a language module biaser, a baseline language model based at least on one or more of the interpolated language model biasing parameters; providing the biased language model for use by an automated speech recognizer (ASR); generating a transcription of the received audio data using the biased language model; and transmitting the generated transcription to a client computing device.
-
Specification