Contextual voice user interface

US 10,446,147 B1
Filed: 06/27/2017
Issued: 10/15/2019
Est. Priority Date: 06/27/2017
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method comprising:

during a first time period at one or more remote devices;

receiving, from a device, first input audio data corresponding to a first utterance;

generating an identifier;

associating the identifier with the first utterance;

performing speech recognition processing on the first input audio data to generate first text data;

associating the first text data with the identifier;

performing natural language processing on the first text data to determine a first intent corresponding to the first utterance;

associating the first intent with the identifier;

performing natural language processing on the first text data to determine at least a portion of the first text data that potentially corresponds to an entity;

associating, with the identifier, the at least a portion of the first text data and an indication of the entity;

determining an application associated with the first intent;

associating application data representing the application with the identifier;

sending, to a remote device associated with the application, a signal requesting content responsive to the first utterance;

receiving, from the remote device, content data representing the content; and

causing the device to emit the content data; and

during a second time period subsequent to the first time period at the one or more remote devices;

receiving, from the device, second input audio data corresponding to a second utterance;

performing speech recognition processing on the second input audio data to generate second text data;

performing natural language processing on the second text data to determine a second intent corresponding to the second utterance, the second intent being to determine an explanation for processing of the first utterance and to receive previous speech processing results corresponding to the first utterance;

determining the identifier associated with the first utterance;

determining, based on the identifier, at least one of the first text data, the first intent, the at least a portion of the first text data, the indication of the entity, or the application data;

determining an output data format associated with the second intent;

generating output data using the output data format, wherein the output data includes the first text data and at least one of the first intent, the indication of the entity, or the application data with at least a first portion of the output data format; and

sending the output data to the device.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Techniques for providing a contextual voice user interface that enables a user to query a speech processing system with respect to the decisions made to answer the user'"'"'s command are described. The speech processing system may store speech processing pipeline data used to process a command. At some point after the system outputs content deemed responsive to the command, a user may speak an utterance corresponding to an inquiry with respect to the processing performed to respond to the command. For example, the user may state “why did you tell me that?” In response thereto, the speech processing system may determine the stored speech processing pipeline data used to respond to the command, and may generate output audio data that describes the data and computing decisions involved in determining the content deemed responsive to the command.

Citations

20 Claims

1. A computer-implemented method comprising:
- during a first time period at one or more remote devices;
  
  receiving, from a device, first input audio data corresponding to a first utterance;
  
  generating an identifier;
  
  associating the identifier with the first utterance;
  
  performing speech recognition processing on the first input audio data to generate first text data;
  
  associating the first text data with the identifier;
  
  performing natural language processing on the first text data to determine a first intent corresponding to the first utterance;
  
  associating the first intent with the identifier;
  
  performing natural language processing on the first text data to determine at least a portion of the first text data that potentially corresponds to an entity;
  
  associating, with the identifier, the at least a portion of the first text data and an indication of the entity;
  
  determining an application associated with the first intent;
  
  associating application data representing the application with the identifier;
  
  sending, to a remote device associated with the application, a signal requesting content responsive to the first utterance;
  
  receiving, from the remote device, content data representing the content; and
  
  causing the device to emit the content data; and
  
  during a second time period subsequent to the first time period at the one or more remote devices;
  
  receiving, from the device, second input audio data corresponding to a second utterance;
  
  performing speech recognition processing on the second input audio data to generate second text data;
  
  performing natural language processing on the second text data to determine a second intent corresponding to the second utterance, the second intent being to determine an explanation for processing of the first utterance and to receive previous speech processing results corresponding to the first utterance;
  
  determining the identifier associated with the first utterance;
  
  determining, based on the identifier, at least one of the first text data, the first intent, the at least a portion of the first text data, the indication of the entity, or the application data;
  
  determining an output data format associated with the second intent;
  
  generating output data using the output data format, wherein the output data includes the first text data and at least one of the first intent, the indication of the entity, or the application data with at least a first portion of the output data format; and
  
  sending the output data to the device.
- View Dependent Claims (2, 3)
- - 2. The computer-implemented method of claim 1, further comprising:
    - receiving, from the device, third input audio data corresponding to a third utterance;
      
      performing speech recognition processing on the third input audio data to generate third text data;
      
      performing natural language processing on the third text data to determine a third intent corresponding to the third utterance, the third intent being to receive speech processing results corresponding to a fourth utterance;
      
      determining a second identifier associated with the fourth utterance;
      
      determining speech processing data associated with the second identifier;
      
      determining the output data format is associated with the third intent;
      
      generating second output data using the output data format, wherein the second output data includes at least a portion of the speech processing data; and
      
      sending the second output data to the device.
  - 3. The computer-implemented method of claim 2, further comprising:
    - determining the third text data includes an indication of a time period when the fourth utterance was spoken;
      
      determining, in a profile associated with the third utterance, the time period; and
      
      determining the second identifier is associated with the time period in the profile.

4. A system comprising:
- at least one processor; and
  
  at least one memory including instructions that, when executed by the at least one processor, cause the system to;
  
  perform natural language processing on input text data representative of input from a user device to determine an intent of a current user input;
  
  determine the intent is to receive an explanation of a previous output corresponding to a previous user input and receive previous speech processing results corresponding to the previous user input;
  
  determine an identifier associated with the previous user input;
  
  determine previous speech recognition results associated with the identifier;
  
  determine previous natural language processing results associated with the identifier;
  
  determine an output format associated with the intent;
  
  generate output data using the output format, wherein the output data includes a portion of the input text data and at least one of;
  
  at least a portion of the previous speech recognition results or at least a portion of the previous natural language processing results; and
  
  send the output data to a first device associated with the input text data.
- View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12)
- - 5. The system of claim 4, wherein:
    - the previous speech recognition results include text data output based on the previous user input, andthe previous natural language processing results include at least one of a previous intent determined based on processing of the previous user input, or application data representing an application associated with the previous intent.
  - 6. The system of claim 4, wherein the instructions, when executed by the at least one processor, further cause the system to:
    - receive input audio data corresponding to the previous user input;
      
      generate the identifier;
      
      perform speech recognition processing on the input audio data to generate second input text data;
      
      associate the second input text data with the identifier;
      
      perform natural language processing on the second input text data to determine a previous intent;
      
      associate the previous intent with the identifier;
      
      determine an application associated with the previous intent;
      
      associate application data representing the application with the identifier;
      
      send, to a remote device associated with the application, a signal requesting content responsive to the previous user input;
      
      receive, from the remote device, content data representing the content; and
      
      cause the first device to emit the content data.
  - 7. The system of claim 6, wherein the instructions, when executed by the at least one processor, further cause the system to:
    - associate, after determining the application, the second input text data with the identifier and the previous intent with the identifier.
  - 8. The system of claim 4, wherein the previous user input is a previously spoken utterance.
  - 9. The system of claim 4, wherein the instructions, when executed by the at least one processor, further cause the system to:
    - determine context data representing at least one of a geographic location of a user or a timestamp corresponding to when audio corresponding to the previous user input was received; and
      
      associate the context data with the identifier,wherein the output data further includes at least a portion of the context data.
  - 10. The system of claim 4, wherein the instructions, when executed by the at least one processor, further cause the system to:
    - determine a user corresponding to the input text data is an application developer; and
      
      determine, based on the user being an application developer, a second output format.
  - 11. The system of claim 4, wherein the instructions, when executed by the at least one processor, further cause the system to:
    - receive input audio data corresponding to a second utterance;
      
      perform speech processing on the input audio data to determine a second intent to determine a new content source for the previous user input; and
      
      associate the previous user input with the new content source.
  - 12. The system of claim 4, wherein the instructions, when executed by the at least one processor, further cause the system to:
    - determine the previous user input was received at a first time;
      
      determine the first time is within a threshold amount of time to a current time; and
      
      determine, based on the first time being within the threshold amount of time, the identifier.

13. A computer-implemented method comprising:
- performing natural language processing on input text data representative of input from a user device to determine an intent of a current user input;
  
  determining the intent is to receive an explanation of a previous output corresponding to a previous user input and receive previous speech processing results corresponding to the previous user input;
  
  determining an identifier associated with the previous user input;
  
  determining previous speech recognition results associated with the identifier;
  
  determining previous natural language processing results associated with the identifier;
  
  determining an output format associated with the intent;
  
  generating output data using the output format, wherein the output data includes a portion of the input text data and at least one of;
  
  at least a portion of the previous speech recognition results or at least a portion of the previous natural language processing results; and
  
  sending the output data to a first device associated with the input text data.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
- - 14. The computer-implemented method of claim 13, wherein:
    - the previous speech recognition results include text data output based on the previous user input, andthe previous natural language processing results include at least one of a previous intent determined based on processing of the previous user input, or application data representing an application associated with the previous intent.
  - 15. The computer-implemented method of claim 13, further comprising:
    - receiving input audio data corresponding to the previous user input;
      
      generating the identifier;
      
      performing speech recognition processing on the input audio data to generate second input text data;
      
      associating the second input text data with the identifier;
      
      performing natural language processing on the second input text data to determine a previous intent;
      
      associating the previous intent with the identifier;
      
      determining an application associated with the previous intent;
      
      associating application data representing the application with the identifier;
      
      sending, to a remote device associated with the application, a signal requesting content responsive to the previous user input;
      
      receiving, from the remote device, content data representing the content; and
      
      causing the first device to emit the content data.
  - 16. The computer-implemented method of claim 15, further comprising:
    - associating, after determining the application, the second input text data with the identifier and the previous intent with the identifier.
  - 17. The computer-implemented method of claim 13, wherein the previous user input is a previously spoken utterance.
  - 18. The computer-implemented method of claim 13, further comprising:
    - determining context data representing at least one of a geographic location of user or a timestamp corresponding to when audio corresponding to the previous user input was received; and
      
      associating the context data with the identifier,wherein the output data further includes at least a portion of the context data.
  - 19. The computer-implemented method of claim 13, further comprising:
    - determining a user corresponding to the input text data is an application developer; and
      
      determining, based on the user being an application developer, a second output format.
  - 20. The computer-implemented method of claim 13, further comprising:
    - receiving second input audio data corresponding to a second utterance;
      
      performing speech processing on the second input audio data to determine a second intent to determine a new content source for the previous user input; and
      
      associating the previous user input with the new content source.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Moniz, Michael James, Ravi, Abishek, Aldrich, Ryan Scott, Adams, Michael Bennett
Primary Examiner(s)
Shah, Paras D
Assistant Examiner(s)
Blankenagel, Bryan S

Application Number

US15/634,780
Time in Patent Office

840 Days
Field of Search
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G06F 40/205   Parsing

G06F 40/30   Semantic analysis

G10L 15/063   Training

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 2015/223   Execution procedure of a sp...

Contextual voice user interface

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Contextual voice user interface

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links