AUTOMATICALLY DETERMINING LANGUAGE FOR SPEECH RECOGNITION OF SPOKEN UTTERANCE RECEIVED VIA AN AUTOMATED ASSISTANT INTERFACE
First Claim
1. A method implemented by one or more processors, the method comprising:
- receiving audio data corresponding to a spoken utterance of a user, the audio data being based on detection of the spoken utterance at a client device that includes an automated assistant interface for interacting with an automated assistant;
processing the audio data using a first speech recognition model corresponding to a first language;
determining, based on processing the audio data using the first speech recognition model, content that is responsive to the spoken utterance of the user;
causing the client device to render the content to the user, wherein the content includes a prompt that solicits further input from the user;
in response to determining the content includes the prompt, monitoring for additional spoken input;
receiving, during the monitoring, additional audio data corresponding to an additional spoken utterance, the additional audio data being based on detection of the additional spoken utterance by the automated assistant interface of the client device;
determining whether to utilize the first speech recognition model for the first language, or an alternative speech recognition model for a second language, in determining further responsive content to provide in response to the additional spoken utterance,wherein determining whether to utilize the first speech recognition model or the alternative speech recognition model in determining the further responsive content is based on a monitoring duration corresponding to a time period for the monitoring for the additional spoken input from the user, wherein as the monitoring duration increases, a probability of utilizing the alternative speech recognition model increases; and
causing the client device to render the further responsive content.
1 Assignment
0 Petitions
Accused Products
Abstract
Implementations relate to determining a language for speech recognition of a spoken utterance, received via an automated assistant interface, for interacting with an automated assistant. Implementations can enable multilingual interaction with the automated assistant, without necessitating a user explicitly designate a language to be utilized for each interaction. Selection of a speech recognition model for a particular language can based on one or more interaction characteristics exhibited during a dialog session between a user and an automated assistant. Such interaction characteristics can include anticipated user input types, anticipated user input durations, a duration for monitoring for a user response, and/or an actual duration of a provided user response.
3 Citations
21 Claims
-
1. A method implemented by one or more processors, the method comprising:
-
receiving audio data corresponding to a spoken utterance of a user, the audio data being based on detection of the spoken utterance at a client device that includes an automated assistant interface for interacting with an automated assistant; processing the audio data using a first speech recognition model corresponding to a first language; determining, based on processing the audio data using the first speech recognition model, content that is responsive to the spoken utterance of the user; causing the client device to render the content to the user, wherein the content includes a prompt that solicits further input from the user; in response to determining the content includes the prompt, monitoring for additional spoken input; receiving, during the monitoring, additional audio data corresponding to an additional spoken utterance, the additional audio data being based on detection of the additional spoken utterance by the automated assistant interface of the client device; determining whether to utilize the first speech recognition model for the first language, or an alternative speech recognition model for a second language, in determining further responsive content to provide in response to the additional spoken utterance, wherein determining whether to utilize the first speech recognition model or the alternative speech recognition model in determining the further responsive content is based on a monitoring duration corresponding to a time period for the monitoring for the additional spoken input from the user, wherein as the monitoring duration increases, a probability of utilizing the alternative speech recognition model increases; and causing the client device to render the further responsive content. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10)
-
-
2. (canceled)
-
11. A method implemented by one or more processors, the method comprising:
-
receiving audio data corresponding to a spoken utterance of a user, the audio data being based on detection of the spoken utterance by a client device that includes an automated assistant interface for interacting with an automated assistant; processing the audio data using a first speech recognition model corresponding to a first language; determining, based on processing the audio data using the first speech recognition model, content that is responsive to the spoken utterance of the user; causing the client device to render the content to the user and to monitor for additional spoken input following the rendering; receiving, during the monitoring, additional audio data corresponding to an additional spoken utterance, the additional audio data being based on detection of the additional spoken utterance by the automated assistant interface of the client device; determining an anticipated type of input for the additional spoken utterance based on historical interaction data that identifies at least one interaction, between the user and the automated assistant, in which the user provided particular diction or particular terminology to the automated assistant; determining, based on determining the anticipated type of input for the additional spoken utterance, whether to utilize the first speech recognition model for the first language, or an alternative speech recognition model for a second language, in determining further responsive content to provide in response to the additional spoken utterance, wherein the anticipated type of input includes an anticipated diction or anticipated terminology for the additional spoken utterance, and wherein determining whether to utilize the first speech recognition model or the alternative speech recognition model is based on one or more of; a monitoring duration corresponding to a time period for the monitoring for the additional spoken input from the user, an input duration corresponding to a duration of the additional spoken utterance of the user, and an anticipated type of input for the additional spoken utterance; and causing the client device to render the further responsive content. - View Dependent Claims (14)
-
-
12. (canceled)
-
13. (canceled)
-
15. A method implemented by one or more processors, the method comprising:
-
receiving audio data corresponding to a spoken utterance of a user, the audio data being based on detection of the spoken utterance a client device that includes an automated assistant interface for interacting with an automated assistant; processing the audio data using a first speech recognition model corresponding to a first language; determining, based on processing the audio data using the first speech recognition model, content that is responsive to the spoken utterance of the user; monitoring for an additional spoken input from the user; receiving, during the monitoring, additional audio data corresponding to an additional spoken utterance, the additional audio data being based on detection of the additional spoken utterance by the automated assistant interface of the client device, wherein the additional spoken utterance is provided by another user; determining, based on receiving the additional audio data, that the additional spoken utterance is provided by the other user; accessing, based on the additional spoken utterance being provided by the other user, a user profile corresponding to the other user, wherein the user profile provides a correspondence between the other user and the second language; determining a selection of one or more speech recognitions models to use for processing the additional audio data, the one or more speech recognition models selected from multiple different speech recognition models that include at least the first speech recognition model for the first language and a second speech recognition model for a second language; processing the additional audio data according to the selection of the speech recognition model; and causing the client device to render further responsive content based on the processing of the additional audio data according to the selection of the speech recognition model. - View Dependent Claims (17, 18, 19, 20)
-
-
16. (canceled)
-
21-23. -23. (canceled)
Specification