Systems and method to resolve audio-based requests in a networked environment

US 10,679,614 B2
Filed: 04/24/2019
Issued: 06/09/2020
Est. Priority Date: 04/16/2018
Status: Active Grant

First Claim

Patent Images

1. A system to resolve requests in an audio-based networked system, comprising a computing device comprising one or more processors and a memory, the one or more processors to execute:

a proficiency detector to;

receive a vocal utterance captured at a client device;

determine a vocal characteristic of the vocal utterance captured at the client device;

a speech-to-text module to select a query understanding model from a plurality of candidate query understanding models based on the vocal characteristic;

an intent matcher to determine an intent of the vocal utterance using the query understanding model;

a fulfillment module to select a content item based on the intent and one or more keywords parsed from the vocal utterance; and

an interface to transmit the content item to the client device.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques are described herein for enabling an automated assistant to adjust its behavior depending on a detected vocabulary level or other vocal characteristics of an input utterance provided to an automated assistant. The estimated vocabulary level or other vocal characteristics may be used to influence various aspects of a data processing pipeline employed by the automated assistant. In some implementations, one or more tolerance thresholds associated with, for example, grammatical tolerances or vocabulary tolerances, may be adjusted based on the estimated vocabulary level or vocal characteristics of the input utterance.

17 Citations

20 Claims

1. A system to resolve requests in an audio-based networked system, comprising a computing device comprising one or more processors and a memory, the one or more processors to execute:
- a proficiency detector to;
  
  receive a vocal utterance captured at a client device;
  
  determine a vocal characteristic of the vocal utterance captured at the client device;
  
  a speech-to-text module to select a query understanding model from a plurality of candidate query understanding models based on the vocal characteristic;
  
  an intent matcher to determine an intent of the vocal utterance using the query understanding model;
  
  a fulfillment module to select a content item based on the intent and one or more keywords parsed from the vocal utterance; and
  
  an interface to transmit the content item to the client device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system of claim 1, comprising:
    - a content selector component to select a digital component based on the one or more keywords parsed from the vocal utterance; and
      
      the interface to transmit the digital component to the client device.
  - 3. The system of claim 1, wherein each of the plurality of candidate query understanding models include a different grammatical tolerance.
  - 4. The system of claim 1, comprising:
    - the proficiency detector to determine a vocabulary level of the vocal utterance; and
      
      the fulfillment module to select the content item based on the vocabulary level of the vocal utterance.
  - 5. The system of claim 1, comprising:
    - the proficiency detector to determine a vocabulary level of the vocal utterance; and
      
      the speech-to-text module to select the query understanding model from the plurality of candidate query understanding models based on the vocabulary level of the vocal utterance.
  - 6. The system of claim 1, wherein the vocal characteristic comprises at least one of phonemes, a pitch, frequency components, or a cadence of the vocal utterance.
  - 7. The system of claim 1, comprising:
    - the fulfillment module to select the content item based to match a vocabulary level of the vocal utterance.
  - 8. The system of claim 1, comprising:
    - a text-to-speech module to select a voice synthesis model based on the vocal characteristic of the vocal utterance, the voice synthesis model to render the content item at the client device.
  - 9. The system of claim 1, comprising:
    - an invocation module to set an invocation threshold to invoke processing of vocal utterances based on the vocal characteristic.
  - 10. The system of claim 1, comprising:
    - a natural language processor to increase a tolerance to at least one of grammatical, vocabulary, or pronunciation errors based on the vocal characteristic.

11. A method implemented using one or more processors, comprising:
- receiving, by a proficiency detector executed by one or more processors, a vocal utterance captured at a client device;
  
  determining, by the proficiency detector executed by the one or more processors, a vocal characteristic based on the vocal utterance captured at the client device;
  
  selecting, by a speech-to-text module executed by the one or more processors, a query understanding model from a plurality of candidate query understanding models based on the vocal characteristic;
  
  determining, by an intent matcher executed by the one or more processors, an intent of the vocal utterance using the query understanding model;
  
  selecting, by a fulfillment module executed by the one or more processors, a content item based on the intent and one or more keywords parsed from the vocal utterance; and
  
  transmitting, by the one or more processors, the content item to the client device.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. The method of claim 11, comprising:
    - selecting, by a content selector component executed by the one or more processors, a digital component, based on the one or more keywords parsed from the vocal utterance; and
      
      transmitting, by the one or more processors, the digital component to the client device.
  - 13. The method of claim 11, wherein each of the plurality of candidate query understanding models include a different grammatical tolerance.
  - 14. The method of claim 11, comprising:
    - determining a vocabulary level of the vocal utterance; and
      
      selecting the content item based on the vocabulary level of the vocal utterance.
  - 15. The method of claim 11, comprising:
    - determining a vocabulary level of the vocal utterance; and
      
      selecting the query understanding model from the plurality of candidate query understanding models based on the vocabulary level of the vocal utterance.
  - 16. The method of claim 11, wherein the vocal characteristic comprises at least one of phonemes, a pitch, frequency components, or a cadence of the vocal utterance.
  - 17. The method of claim 11, comprising:
    - selecting, by the fulfillment module executed by the one or more processors, the content item based to match the vocal characteristic.
  - 18. The method of claim 11, comprising:
    - selecting a voice synthesis model based on the vocal characteristic to render the content item at the client device.
  - 19. The method of claim 11, comprising:
    - setting, by an invocation module executed by the one or more processors and based on the vocal characteristic, an invocation threshold to invoke processing of vocal utterances.
  - 20. The method of claim 11, comprising:
    - increasing, by a natural language processor executed by the one or more processors, a tolerance to at least one of grammatical, vocabulary, or pronunciation errors based on the vocal characteristic.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Google LLC (Alphabet Inc.)
Inventors
Anders, Pedro Gonnet, Carbune, Victor, Keysers, Daniel, Deselaers, Thomas, Feuz, Sandro
Primary Examiner(s)
Albertalli, Brian L

Application Number

US16/393,785
Publication Number

US 20190348030A1
Time in Patent Office

412 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/90332   Natural language query form...

G06F 16/9035   Filtering based on addition...

G06F 3/167   Audio in a user interface, ...

G06F 40/35   Discourse or dialogue repre...

G06F 40/56   Natural language generation

G10L 15/08   Speech classification or se...

G10L 15/18   using natural language mode...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/19   Grammatical context, e.g. d...

G10L 15/22   Procedures used during a sp...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/227   of the speaker; Human-fact...

Systems and method to resolve audio-based requests in a networked environment

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

17 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and method to resolve audio-based requests in a networked environment

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

17 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links