AUTOMATICALLY DETERMINING LANGUAGE FOR SPEECH RECOGNITION OF SPOKEN UTTERANCE RECEIVED VIA AN AUTOMATED ASSISTANT INTERFACE

US 20200135187A1
Filed: 04/16/2018
Published: 04/30/2020
Est. Priority Date: 04/16/2018
Status: Active Grant

First Claim

Patent Images

1. A method implemented by one or more processors, the method comprising:

receiving audio data corresponding to a spoken utterance of a user, the audio data being based on detection of the spoken utterance at a client device that includes an automated assistant interface for interacting with an automated assistant;

processing the audio data using a first speech recognition model corresponding to a first language;

determining, based on processing the audio data using the first speech recognition model, content that is responsive to the spoken utterance of the user;

causing the client device to render the content to the user, wherein the content includes a prompt that solicits further input from the user;

in response to determining the content includes the prompt, monitoring for additional spoken input;

receiving, during the monitoring, additional audio data corresponding to an additional spoken utterance, the additional audio data being based on detection of the additional spoken utterance by the automated assistant interface of the client device;

determining whether to utilize the first speech recognition model for the first language, or an alternative speech recognition model for a second language, in determining further responsive content to provide in response to the additional spoken utterance,wherein determining whether to utilize the first speech recognition model or the alternative speech recognition model in determining the further responsive content is based on a monitoring duration corresponding to a time period for the monitoring for the additional spoken input from the user, wherein as the monitoring duration increases, a probability of utilizing the alternative speech recognition model increases; and

causing the client device to render the further responsive content.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Implementations relate to determining a language for speech recognition of a spoken utterance, received via an automated assistant interface, for interacting with an automated assistant. Implementations can enable multilingual interaction with the automated assistant, without necessitating a user explicitly designate a language to be utilized for each interaction. Selection of a speech recognition model for a particular language can based on one or more interaction characteristics exhibited during a dialog session between a user and an automated assistant. Such interaction characteristics can include anticipated user input types, anticipated user input durations, a duration for monitoring for a user response, and/or an actual duration of a provided user response.

3 Citations

21 Claims

1. A method implemented by one or more processors, the method comprising:
- receiving audio data corresponding to a spoken utterance of a user, the audio data being based on detection of the spoken utterance at a client device that includes an automated assistant interface for interacting with an automated assistant;
  
  processing the audio data using a first speech recognition model corresponding to a first language;
  
  determining, based on processing the audio data using the first speech recognition model, content that is responsive to the spoken utterance of the user;
  
  causing the client device to render the content to the user, wherein the content includes a prompt that solicits further input from the user;
  
  in response to determining the content includes the prompt, monitoring for additional spoken input;
  
  receiving, during the monitoring, additional audio data corresponding to an additional spoken utterance, the additional audio data being based on detection of the additional spoken utterance by the automated assistant interface of the client device;
  
  determining whether to utilize the first speech recognition model for the first language, or an alternative speech recognition model for a second language, in determining further responsive content to provide in response to the additional spoken utterance,wherein determining whether to utilize the first speech recognition model or the alternative speech recognition model in determining the further responsive content is based on a monitoring duration corresponding to a time period for the monitoring for the additional spoken input from the user, wherein as the monitoring duration increases, a probability of utilizing the alternative speech recognition model increases; and
  
  causing the client device to render the further responsive content.
- View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10)
- - 3. The method of claim 1, wherein determining whether to utilize the first speech recognition model or the alternative speech recognition model in determining the further responsive content is based on an input duration corresponding to a duration of the additional spoken utterance of the user.
  - 4. The method of claim 3, wherein as the input duration increases, a probability of utilizing the alternative speech recognition model increases.
  - 5. The method of claim 3, further comprising:
    - identifying an anticipated duration for the additional spoken utterance based on the prompt;
      
      wherein determining whether to utilize the first speech recognition model or the alternative speech recognition model based on the input duration is based on comparison of the anticipated duration to the input duration, and as a difference based on the comparison increases, a probability of utilizing the alternative speech recognition model increases.
  - 6. The method of claim 1, wherein determining whether to utilize the first speech recognition model or the alternative speech recognition model in determining the further responsive content is based on an anticipated type of input for the additional spoken utterance.
  - 7. The method of claim 6, wherein determining whether to utilize the first speech recognition model or the alternative speech recognition model in determining the further responsive content comprises:
    - identifying a first measure, for the user, that is based on past inputs of the user in the first language in response to past prompts of the anticipated type;
      
      identifying a second measure, for the user, that is based on past inputs of the user in the second language in response to the past prompts of the anticipated type; and
      
      determining whether to utilize the first speech recognition model or the alternative speech recognition model based on the first measure and the second measure.
  - 8. The method of claim 7, further comprising:
    - identifying a user profile for the user based on the audio data or based on additional sensor data from one or more additional sensors of the client device;
      
      wherein identifying the first measure and the second measure for the user are based on the first measure and the second measure being stored in association with the user profile.
  - 9. The method of claim 1, wherein determining whether to utilize the first speech recognition model or the alternative speech recognition model in determining the further responsive content comprises:
    - processing the additional audio data using the first speech recognition model to generate first predicted text in the first language, and a first measure that indicates a first likelihood that the first predicted text is correct;
      
      processing the additional audio data using the alternative speech recognition model to generate second predicted text in the second language, and a second measure that indicates a second likelihood that the second predicted text is correct;
      
      determining the further responsive content utilizing the second predicted text, wherein determining the further responsive content utilizing the second predicted text is based on the second measure and is based on one or more of;
      
      a monitoring duration corresponding to a time period for the monitoring for the additional spoken input from the user,an input duration corresponding to a duration of the additional spoken utterance of the user, andan anticipated type of input for the additional spoken utterance.
  - 10. The method of claim 9, wherein determining the further responsive content utilizing the second predicted text is based on the second measure and is based on two or more of:
    - the monitoring duration,the input duration, andthe anticipated type of input for the additional spoken utterance.

2. (canceled)

11. A method implemented by one or more processors, the method comprising:
- receiving audio data corresponding to a spoken utterance of a user, the audio data being based on detection of the spoken utterance by a client device that includes an automated assistant interface for interacting with an automated assistant;
  
  processing the audio data using a first speech recognition model corresponding to a first language;
  
  determining, based on processing the audio data using the first speech recognition model, content that is responsive to the spoken utterance of the user;
  
  causing the client device to render the content to the user and to monitor for additional spoken input following the rendering;
  
  receiving, during the monitoring, additional audio data corresponding to an additional spoken utterance, the additional audio data being based on detection of the additional spoken utterance by the automated assistant interface of the client device;
  
  determining an anticipated type of input for the additional spoken utterance based on historical interaction data that identifies at least one interaction, between the user and the automated assistant, in which the user provided particular diction or particular terminology to the automated assistant;
  
  determining, based on determining the anticipated type of input for the additional spoken utterance, whether to utilize the first speech recognition model for the first language, or an alternative speech recognition model for a second language, in determining further responsive content to provide in response to the additional spoken utterance, wherein the anticipated type of input includes an anticipated diction or anticipated terminology for the additional spoken utterance, and wherein determining whether to utilize the first speech recognition model or the alternative speech recognition model is based on one or more of;
  
  a monitoring duration corresponding to a time period for the monitoring for the additional spoken input from the user,an input duration corresponding to a duration of the additional spoken utterance of the user, andan anticipated type of input for the additional spoken utterance; and
  
  causing the client device to render the further responsive content.
- View Dependent Claims (14)
- - 14. The method of claim 11, wherein determining whether to utilize the first speech recognition model or the alternative speech recognition model is based the monitoring duration, wherein the first speech recognition model is utilized when the monitoring duration is less than a monitoring threshold, and the alternative speech recognition model is utilized when the monitoring duration is greater than the monitoring threshold.

12. (canceled)

13. (canceled)

15. A method implemented by one or more processors, the method comprising:
- receiving audio data corresponding to a spoken utterance of a user, the audio data being based on detection of the spoken utterance a client device that includes an automated assistant interface for interacting with an automated assistant;
  
  processing the audio data using a first speech recognition model corresponding to a first language;
  
  determining, based on processing the audio data using the first speech recognition model, content that is responsive to the spoken utterance of the user;
  
  monitoring for an additional spoken input from the user;
  
  receiving, during the monitoring, additional audio data corresponding to an additional spoken utterance, the additional audio data being based on detection of the additional spoken utterance by the automated assistant interface of the client device, wherein the additional spoken utterance is provided by another user;
  
  determining, based on receiving the additional audio data, that the additional spoken utterance is provided by the other user;
  
  accessing, based on the additional spoken utterance being provided by the other user, a user profile corresponding to the other user, wherein the user profile provides a correspondence between the other user and the second language;
  
  determining a selection of one or more speech recognitions models to use for processing the additional audio data, the one or more speech recognition models selected from multiple different speech recognition models that include at least the first speech recognition model for the first language and a second speech recognition model for a second language;
  
  processing the additional audio data according to the selection of the speech recognition model; and
  
  causing the client device to render further responsive content based on the processing of the additional audio data according to the selection of the speech recognition model.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The method of claim 15, wherein determining the selection of the one or more speech recognition models to use for processing the additional audio data includes designating the second speech recognition model for processing the additional audio data, and designating the second language for rendering the further responsive content.
  - 18. The method of claim 15, wherein determining the selection of the one or more speech recognitions models to use for processing the additional audio data includes determining a subset of speech recognition models to use for processing the additional audio data.
  - 19. The method of claim 18, wherein processing the additional audio data according to the selection of the speech recognition model includes processing the additional audio data using the subset of speech recognition models.
  - 20. The method of claim 15, wherein determining the selection of the speech recognition model to use for processing the additional audio data includes identifying multiple different interaction characteristics that occurred since receiving the audio data corresponding to the spoken utterance from the user, the interaction characteristics comprising two or more of a monitoring duration, an input duration, and an anticipated type of input for the additional spoken utterance.

16. (canceled)

21-23. -23. (canceled)

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Chao, Pu-sen, Casado, Diego Melendo, Moreno, Ignacio Lopez

Granted Patent

US 10,896,672 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 15/005   Language recognition

G10L 15/02   Feature extraction for spee...

G10L 15/14   using statistical models, e...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/1822   Parsing for meaning underst...

G10L 15/183   using context dependencies,...

G10L 15/22   Procedures used during a sp...

G10L 15/30   Distributed recognition, e....

G10L 15/32   Multiple recognisers used i...

G10L 2015/088   Word spotting

G10L 2015/223   Execution procedure of a sp...

G10L 2015/228   of application context

AUTOMATICALLY DETERMINING LANGUAGE FOR SPEECH RECOGNITION OF SPOKEN UTTERANCE RECEIVED VIA AN AUTOMATED ASSISTANT INTERFACE

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

3 Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

AUTOMATICALLY DETERMINING LANGUAGE FOR SPEECH RECOGNITION OF SPOKEN UTTERANCE RECEIVED VIA AN AUTOMATED ASSISTANT INTERFACE

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

3 Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links