AUTOMATICALLY DETERMINING LANGUAGE FOR SPEECH RECOGNITION OF SPOKEN UTTERANCE RECEIVED VIA AN AUTOMATED ASSISTANT INTERFACE

US 20190318735A1
Filed: 10/17/2018
Published: 10/17/2019
Est. Priority Date: 04/16/2018
Status: Active Grant

First Claim

Patent Images

1. A method implemented by one or more processors, the method comprising:

determining that a spoken utterance was received at an automated assistant interface of a computing device that is accessible to an automated assistant, wherein the spoken utterance is provided in a first language and the automated assistant is configured to provide a responsive output according to a language selected from at least the first language and a second language;

selecting, in response to determining that the spoken utterance was received at the automated assistant interface, a user-specific language profile corresponding to a user that provided the spoken utterance, wherein the user-specific language profile identifies at least the second language as a candidate language for providing the responsive output;

accessing data that characterizes user activity associated with interactions between the user and one or more applications prior to the user providing the spoken utterance, wherein the data indicates that the user has interacted with the one or more applications using the first language;

selecting, based on the data that characterizes the user activity, the first language over the second language for providing the responsive output;

causing, based on the first language being selected over the second language, responsive audio data to be generated, wherein the responsive audio data characterizes the responsive output as expressed using the first language; and

causing, when the responsive audio data has been at least partially generated, the responsive output to be provided, at the computing device via the automated assistant, using the responsive audio data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Determining a language for speech recognition of a spoken utterance received via an automated assistant interface for interacting with an automated assistant. Implementations can enable multilingual interaction with the automated assistant, without necessitating a user explicitly designate a language to be utilized for each interaction. Implementations determine a user profile that corresponds to audio data that captures a spoken utterance, and utilize language(s), and optionally corresponding probabilities, assigned to the user profile in determining a language for speech recognition of the spoken utterance. Some implementations select only a subset of languages, assigned to the user profile, to utilize in speech recognition of a given spoken utterance of the user. Some implementations perform speech recognition in each of multiple languages assigned to the user profile, and utilize criteria to select only one of the speech recognitions as appropriate for generating and providing content that is responsive to the spoken utterance.

41 Citations

20 Claims

1. A method implemented by one or more processors, the method comprising:
- determining that a spoken utterance was received at an automated assistant interface of a computing device that is accessible to an automated assistant, wherein the spoken utterance is provided in a first language and the automated assistant is configured to provide a responsive output according to a language selected from at least the first language and a second language;
  
  selecting, in response to determining that the spoken utterance was received at the automated assistant interface, a user-specific language profile corresponding to a user that provided the spoken utterance, wherein the user-specific language profile identifies at least the second language as a candidate language for providing the responsive output;
  
  accessing data that characterizes user activity associated with interactions between the user and one or more applications prior to the user providing the spoken utterance, wherein the data indicates that the user has interacted with the one or more applications using the first language;
  
  selecting, based on the data that characterizes the user activity, the first language over the second language for providing the responsive output;
  
  causing, based on the first language being selected over the second language, responsive audio data to be generated, wherein the responsive audio data characterizes the responsive output as expressed using the first language; and
  
  causing, when the responsive audio data has been at least partially generated, the responsive output to be provided, at the computing device via the automated assistant, using the responsive audio data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein selecting the user-specific language profile is based on audio data that embodies at least a portion of the spoken utterance and a voice characteristic exhibited by the user when the user provided the portion of the spoken utterance.
  - 3. The method of claim 1, wherein the automated assistant is configured to select the first language as a default language when user activity data is not accessible to the automated assistant or does not indicate that the user has interacted with the one or more applications using the second language.
  - 4. The method of claim 1, further comprising:
    - modifying, based on the data that characterizes the user activity, the user-specific language profile to indicate that the first language is another candidate for providing subsequent responsive content via the automated assistant.
  - 5. The method of claim 1, wherein the data further indicates that the user has interacted with the one or more applications using both the first language and the second language.
  - 6. The method of claim 5, wherein the data further indicates that the user has provided an input to an application of the one or more applications using the second language and engaged with content, which was provided by the application in the first language.
  - 7. The method of claim 1, further comprising:
    - causing, at least based on the data that characterizes the user activity, the computing device to receive a language model, corresponding to the first language, for processing at least a portion of subsequent spoken utterances provided in the first language.
  - 8. The method of claim 1, wherein the user-specific language profile identifying at least the second language is based on a setting, of the automated assistant, that was explicitly set by the user before the spoken utterance was received at the automated assistant interface.

9. A method implemented by one or more processors, the method comprising:
- determining that a spoken utterance was received by a computing device from a user, the computing device comprising an automated assistant that is capable of being invoked in response to the user providing the spoken utterance;
  
  causing audio data, which is based on the spoken utterance, to be processed, by at least a first language model and a second language model, wherein the first language model and the second language model are selected according to a user-specific preference of language models for interpreting spoken utterances from the user;
  
  determining, based on processing of the audio data, a first score that characterizes a probability that the spoken utterance was provided in a first language and a second score that characterizes another probability that the spoken utterance was provided in a second language;
  
  determining, based on a user-specific language profile that is accessible to the automated assistant, that the user has intentionally accessed digital content provided in the first language;
  
  determining, based on determining that the user has intentionally accessed the digital content provided in the first language, another first score to reflect an increase in the probability that the spoken utterance was provided in the first language; and
  
  causing, based on the other first score and the second score, additional audio data to be processed according to a language selected from at least the first language and the second language.
- View Dependent Claims (10, 11, 12, 13, 14, 19, 20)
- - 10. The method of claim 9, wherein determining that the user has intentionally accessed digital content provided in the first language includes determining that the user provided an input in the second language to an application, and that the user made a selection of the digital content, which was provided in the first language.
  - 11. The method of claim 9, wherein causing additional audio to be processed according to the language selected from at least the first language and the second language includes determining, based on the first score and the second score, a priority of at least one language of the first language and the second language for use when the automated assistant is generating a responsive output for the user.
  - 12. The method of claim 9, further comprising:
    - causing the additional audio data to be processed according to the language and converted into textual data; and
      
      causing the textual data to be input to a text field of a separate application that is different than an application at which the user accessed the digital content.
  - 13. The method of claim 9, wherein the other score is at least partially dependent upon whether the user made a selection of at least one particular content item of different content items that include the digital content provided in the first language and different digital content provided in the second language.
  - 14. The method of claim 9, further comprising:
    - determining, based on the audio data corresponding to the spoken utterance, a voice characteristic associated with the user and captured by the audio data, wherein the user-specific preference of language models is identified based on the voice characteristic.
  - 19. The method of claim 14, wherein determining the user-specific language profile corresponding to the user includes identifying voice characteristics embodied by the auto data and associated with the user.
  - 20. The method of claim 19, wherein selecting a candidate language, from at least the first language and the second language, includes determining, for each language of the first language and the second language, a score that characterizes a similarity between each language and the spoken utterance.

15. A method implemented by one or more processors, the method comprising:
- determining that a user has interacted with one or more applications when the one or more applications were providing natural language content in a first language, wherein the first language is different from a second language that is a user-specific speech processing language for an automated assistant that is accessible via a computing device;
  
  causing, based on determining that the user has interacted with the one or more applications, a user-specific language profile, corresponding to the user, to be modified to reference the first language;
  
  receiving, subsequent to the user-specific language profile being modified to reference the first language, audio data corresponding to a spoken utterance that was at least partially received at an automated assistant interface of the computing device;
  
  causing, based on the first language being included in the user-specific language profile and the second language being the user-specific speech processing language, the audio data to be processed by a first language model corresponding to the first language and a second language model corresponding to the second language;
  
  receiving, based on the first language model and the second language model processing the audio data, a first score and a second score, wherein the first score characterizes a probability that the spoken utterance was provided by the user in the first language and the second score characterizes another probability that the spoken utterance was provided by the user in the second language;
  
  selecting, based on at least the first score and the second score, a candidate language, from at least the first language and the second language, for use when processing additional audio data corresponding to the spoken utterance;
  
  causing, based on selecting the candidate language, the additional audio data corresponding to the spoken utterance to be processed using a particular language model that corresponds to the candidate language.
- View Dependent Claims (16, 17, 18)
- - 16. The method of claim 15, wherein selecting the candidate language is further based on whether the user engaged with natural language content by directly selecting the natural language content.
  - 17. The method of claim 15, wherein, when application data accessible to the automated assistant indicates that the user has previously selected other natural language content provided in the second language, the second language is selected as the candidate language over the first language.
  - 18. The method of claim 15, further comprising:
    - causing, based on the spoken utterance being processed using the particular language model, responsive data to be provided to the computing device, wherein the responsive data embodies the candidate language and is configured to be used, by the computing device, to provide an audible output for the user.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Chao, Pu-sen, Casado, Diego Melendo, Moreno, Ignacio Lopez, Zhang, William

Granted Patent

US 11,017,766 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G10L 13/00   Speech synthesis; Text to s...

G10L 15/005   Language recognition

G10L 15/08   Speech classification or se...

G10L 15/14   using statistical models, e...

G10L 15/1822   Parsing for meaning underst...

G10L 15/197   Probabilistic grammars, e.g...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

G10L 2015/228   of application context

AUTOMATICALLY DETERMINING LANGUAGE FOR SPEECH RECOGNITION OF SPOKEN UTTERANCE RECEIVED VIA AN AUTOMATED ASSISTANT INTERFACE

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

41 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

AUTOMATICALLY DETERMINING LANGUAGE FOR SPEECH RECOGNITION OF SPOKEN UTTERANCE RECEIVED VIA AN AUTOMATED ASSISTANT INTERFACE

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

41 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links