AUTOMATICALLY DETERMINING LANGUAGE FOR SPEECH RECOGNITION OF SPOKEN UTTERANCE RECEIVED VIA AN AUTOMATED ASSISTANT INTERFACE
First Claim
1. A method implemented by one or more processors, the method comprising:
- determining that a spoken utterance was received at an automated assistant interface of a computing device that is accessible to an automated assistant, wherein the spoken utterance is provided in a first language and the automated assistant is configured to provide a responsive output according to a language selected from at least the first language and a second language;
selecting, in response to determining that the spoken utterance was received at the automated assistant interface, a user-specific language profile corresponding to a user that provided the spoken utterance, wherein the user-specific language profile identifies at least the second language as a candidate language for providing the responsive output;
accessing data that characterizes user activity associated with interactions between the user and one or more applications prior to the user providing the spoken utterance, wherein the data indicates that the user has interacted with the one or more applications using the first language;
selecting, based on the data that characterizes the user activity, the first language over the second language for providing the responsive output;
causing, based on the first language being selected over the second language, responsive audio data to be generated, wherein the responsive audio data characterizes the responsive output as expressed using the first language; and
causing, when the responsive audio data has been at least partially generated, the responsive output to be provided, at the computing device via the automated assistant, using the responsive audio data.
1 Assignment
0 Petitions
Accused Products
Abstract
Determining a language for speech recognition of a spoken utterance received via an automated assistant interface for interacting with an automated assistant. Implementations can enable multilingual interaction with the automated assistant, without necessitating a user explicitly designate a language to be utilized for each interaction. Implementations determine a user profile that corresponds to audio data that captures a spoken utterance, and utilize language(s), and optionally corresponding probabilities, assigned to the user profile in determining a language for speech recognition of the spoken utterance. Some implementations select only a subset of languages, assigned to the user profile, to utilize in speech recognition of a given spoken utterance of the user. Some implementations perform speech recognition in each of multiple languages assigned to the user profile, and utilize criteria to select only one of the speech recognitions as appropriate for generating and providing content that is responsive to the spoken utterance.
41 Citations
20 Claims
-
1. A method implemented by one or more processors, the method comprising:
-
determining that a spoken utterance was received at an automated assistant interface of a computing device that is accessible to an automated assistant, wherein the spoken utterance is provided in a first language and the automated assistant is configured to provide a responsive output according to a language selected from at least the first language and a second language; selecting, in response to determining that the spoken utterance was received at the automated assistant interface, a user-specific language profile corresponding to a user that provided the spoken utterance, wherein the user-specific language profile identifies at least the second language as a candidate language for providing the responsive output; accessing data that characterizes user activity associated with interactions between the user and one or more applications prior to the user providing the spoken utterance, wherein the data indicates that the user has interacted with the one or more applications using the first language; selecting, based on the data that characterizes the user activity, the first language over the second language for providing the responsive output; causing, based on the first language being selected over the second language, responsive audio data to be generated, wherein the responsive audio data characterizes the responsive output as expressed using the first language; and causing, when the responsive audio data has been at least partially generated, the responsive output to be provided, at the computing device via the automated assistant, using the responsive audio data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method implemented by one or more processors, the method comprising:
-
determining that a spoken utterance was received by a computing device from a user, the computing device comprising an automated assistant that is capable of being invoked in response to the user providing the spoken utterance; causing audio data, which is based on the spoken utterance, to be processed, by at least a first language model and a second language model, wherein the first language model and the second language model are selected according to a user-specific preference of language models for interpreting spoken utterances from the user; determining, based on processing of the audio data, a first score that characterizes a probability that the spoken utterance was provided in a first language and a second score that characterizes another probability that the spoken utterance was provided in a second language; determining, based on a user-specific language profile that is accessible to the automated assistant, that the user has intentionally accessed digital content provided in the first language; determining, based on determining that the user has intentionally accessed the digital content provided in the first language, another first score to reflect an increase in the probability that the spoken utterance was provided in the first language; and causing, based on the other first score and the second score, additional audio data to be processed according to a language selected from at least the first language and the second language. - View Dependent Claims (10, 11, 12, 13, 14, 19, 20)
-
-
15. A method implemented by one or more processors, the method comprising:
-
determining that a user has interacted with one or more applications when the one or more applications were providing natural language content in a first language, wherein the first language is different from a second language that is a user-specific speech processing language for an automated assistant that is accessible via a computing device; causing, based on determining that the user has interacted with the one or more applications, a user-specific language profile, corresponding to the user, to be modified to reference the first language; receiving, subsequent to the user-specific language profile being modified to reference the first language, audio data corresponding to a spoken utterance that was at least partially received at an automated assistant interface of the computing device; causing, based on the first language being included in the user-specific language profile and the second language being the user-specific speech processing language, the audio data to be processed by a first language model corresponding to the first language and a second language model corresponding to the second language; receiving, based on the first language model and the second language model processing the audio data, a first score and a second score, wherein the first score characterizes a probability that the spoken utterance was provided by the user in the first language and the second score characterizes another probability that the spoken utterance was provided by the user in the second language; selecting, based on at least the first score and the second score, a candidate language, from at least the first language and the second language, for use when processing additional audio data corresponding to the spoken utterance; causing, based on selecting the candidate language, the additional audio data corresponding to the spoken utterance to be processed using a particular language model that corresponds to the candidate language. - View Dependent Claims (16, 17, 18)
-
Specification