Automated assistants that accommodate multiple age groups and/or vocabulary levels

US 10,573,298 B2
Filed: 04/16/2018
Issued: 02/25/2020
Est. Priority Date: 04/16/2018
Status: Active Grant

First Claim

Patent Images

1. A method implemented using one or more processors, comprising:

receiving, at one or more input components of one or more client devices, a vocal utterance from a user;

applying data indicative of the vocal utterance across a trained machine learning model to generate output;

determine, based on the output, that the user falls into a predetermined age group;

selecting, from a plurality of candidate query understanding models, a given query understanding model that is associated with the predetermined age group;

determining an intent of the user using the given query understanding model;

determining, based on the predetermined age group, that the intent of the user is resolvable;

resolving the user'"'"'s intent to generate responsive data; and

outputting, at one or more output components of one or more of the client devices, the responsive data.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques are described herein for enabling an automated assistant to adjust its behavior depending on a detected age range and/or “vocabulary level” of a user who is engaging with the automated assistant. In various implementations, data indicative of a user'"'"'s utterance may be used to estimate one or more of the user'"'"'s age range and/or vocabulary level. The estimated age range/vocabulary level may be used to influence various aspects of a data processing pipeline employed by an automated assistant. In various implementations, aspects of the data processing pipeline that may be influenced by the user'"'"'s age range/vocabulary level may include one or more of automated assistant invocation, speech-to-text (“STT”) processing, intent matching, intent resolution (or fulfillment), natural language generation, and/or text-to-speech (“TTS”) processing. In some implementations, one or more tolerance thresholds associated with one or more of these aspects, such as grammatical tolerances, vocabularic tolerances, etc., may be adjusted.

Citations

20 Claims

1. A method implemented using one or more processors, comprising:
- receiving, at one or more input components of one or more client devices, a vocal utterance from a user;
  
  applying data indicative of the vocal utterance across a trained machine learning model to generate output;
  
  determine, based on the output, that the user falls into a predetermined age group;
  
  selecting, from a plurality of candidate query understanding models, a given query understanding model that is associated with the predetermined age group;
  
  determining an intent of the user using the given query understanding model;
  
  determining, based on the predetermined age group, that the intent of the user is resolvable;
  
  resolving the user'"'"'s intent to generate responsive data; and
  
  outputting, at one or more output components of one or more of the client devices, the responsive data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the plurality of candidate query understanding models include at least one candidate query understanding model with a grammatical tolerance that is different than a grammatical tolerance of the given query understanding model.
  - 3. The method of claim 1, wherein the data indicative of the vocal utterance comprises an audio recording of an utterance by the user, and the trained machine learning model is trained to generate output indicative of an age of the user based on one or more phonemes contained in the audio recording.
  - 4. The method of claim 1, further comprising selecting, from a plurality of candidate natural language generation models, a given natural language generation model that is associated with the predetermined age group, wherein the selected given natural language generation model is used to generate the responsive data.
  - 5. The method of claim 4, wherein the plurality of candidate natural language generation models include at least one candidate natural language generation model that uses a more complex vocabulary than a vocabulary used by the given natural language generation model.
  - 6. The method of claim 1, further comprising selecting, from a plurality of candidate voice synthesis models, a given voice synthesis model that is associated with the predetermined age group, wherein outputting the responsive data is performed using the given voice synthesis model.
  - 7. The method of claim 1, wherein the given query understanding model is applied to perform speech-to-text processing of the vocal utterance.
  - 8. The method of claim 1, wherein the given query understanding model is applied to perform natural language understanding of speech recognition output generated from the vocal utterance.

9. A method implemented using one or more processors, comprising:
- receiving, at one or more input components of one or more client devices, a vocal utterance from a user;
  
  applying data indicative of the vocal utterance across a trained machine learning model to generate output;
  
  determining, based on the output, that the user falls into a given vocabulary level of a plurality of predetermined vocabulary levels;
  
  selecting, from a plurality of candidate query understanding models, a given query understanding model that is associated with the given vocabulary level;
  
  determining an intent of the user using the given query understanding model;
  
  determining, based on the given vocabulary level, that the intent of the user is resolvable;
  
  resolving the user'"'"'s intent to generate responsive data; and
  
  outputting, at one or more output components of one or more of the client devices, the responsive data.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The method of claim 9, wherein the plurality of candidate query understanding models include at least one candidate query understanding model with a grammatical tolerance that is different than a grammatical tolerance of the given query understanding model.
  - 11. The method of claim 9, wherein the data indicative of the vocal utterance comprises an audio recording of an utterance by the user, and the trained machine learning model is trained to generate output indicative of a vocabulary level of the user based on one or more phonemes contained in the audio recording.
  - 12. The method of claim 9, further comprising selecting, from a plurality of candidate natural language generation models, a given natural language generation model that is associated with the given vocabulary level, wherein the selected given natural language generation model is used to generate the responsive data.
  - 13. The method of claim 12, wherein the plurality of candidate natural language generation models include at least one candidate natural language generation model that uses a more complex vocabulary than a vocabulary used by the given natural language generation model.
  - 14. The method of claim 9, further comprising selecting, from a plurality of candidate voice synthesis models, a given voice synthesis model that is associated with the given vocabulary level, wherein outputting the responsive data is performed using the given voice synthesis model.
  - 15. The method of claim 9, wherein the given query understanding model is applied to perform speech-to-text processing of the vocal utterance.
  - 16. The method of claim 9, wherein the given query understanding model is applied to perform natural language understanding of speech recognition output generated from the vocal utterance.

17. A system comprising one or more processors and memory operably coupled with the one or more processors, wherein the memory stores instructions that, in response to execution of the instructions by one or more processors, cause the one or more processors to perform the following operations:
- receiving, at one or more input components of one or more client devices, a vocal utterance from a user;
  
  applying data indicative of the vocal utterance across a trained machine learning model to generate output;
  
  determine, based on the output, that the user falls into a predetermined age group;
  
  selecting, from a plurality of candidate query understanding models, a given query understanding model that is associated with the predetermined age group;
  
  determining an intent of the user using the given query understanding model;
  
  determining, based on the predetermined age group, that the intent of the user is resolvable;
  
  resolving the user'"'"'s intent to generate responsive data; and
  
  outputting, at one or more output components of one or more of the client devices, the responsive data.
- View Dependent Claims (18, 19, 20)
- - 18. The system of claim 17, wherein the plurality of candidate query understanding models include at least one candidate query understanding model with a grammatical tolerance that is different than a grammatical tolerance of the given query understanding model.
  - 19. The system of claim 17, wherein the data indicative of the vocal utterance comprises an audio recording of an utterance by the user, and the trained machine learning model is trained to generate output indicative of an age of the user based on one or more phonemes contained in the audio recording.
  - 20. The system of claim 17, further comprising instructions for selecting, from a plurality of candidate natural language generation models, a given natural language generation model that is associated with the predetermined age group, wherein the selected given natural language generation model is used to generate the responsive data.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Anders, Pedro Gonnet, Carbune, Victor, Keysers, Daniel, Deselaers, Thomas, Feuz, Sandro
Primary Examiner(s)
Albertalli, Brian L

Application Number

US15/954,174
Publication Number

US 20190325864A1
Time in Patent Office

680 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/90332   Natural language query form...

G06F 16/9035   Filtering based on addition...

G06F 3/167   Audio in a user interface, ...

G06F 40/35   Discourse or dialogue repre...

G06F 40/56   Natural language generation

G10L 15/08   Speech classification or se...

G10L 15/18   using natural language mode...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/22   Procedures used during a sp...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/227   of the speaker; Human-fact...

Automated assistants that accommodate multiple age groups and/or vocabulary levels

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Automated assistants that accommodate multiple age groups and/or vocabulary levels

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links