Contextual voice user interface
First Claim
1. A computer-implemented method comprising:
- during a first time period at one or more remote devices;
receiving, from a device, first input audio data corresponding to a first utterance;
generating an identifier;
associating the identifier with the first utterance;
performing speech recognition processing on the first input audio data to generate first text data;
associating the first text data with the identifier;
performing natural language processing on the first text data to determine a first intent corresponding to the first utterance;
associating the first intent with the identifier;
performing natural language processing on the first text data to determine at least a portion of the first text data that potentially corresponds to an entity;
associating, with the identifier, the at least a portion of the first text data and an indication of the entity;
determining an application associated with the first intent;
associating application data representing the application with the identifier;
sending, to a remote device associated with the application, a signal requesting content responsive to the first utterance;
receiving, from the remote device, content data representing the content; and
causing the device to emit the content data; and
during a second time period subsequent to the first time period at the one or more remote devices;
receiving, from the device, second input audio data corresponding to a second utterance;
performing speech recognition processing on the second input audio data to generate second text data;
performing natural language processing on the second text data to determine a second intent corresponding to the second utterance, the second intent being to determine an explanation for processing of the first utterance and to receive previous speech processing results corresponding to the first utterance;
determining the identifier associated with the first utterance;
determining, based on the identifier, at least one of the first text data, the first intent, the at least a portion of the first text data, the indication of the entity, or the application data;
determining an output data format associated with the second intent;
generating output data using the output data format, wherein the output data includes the first text data and at least one of the first intent, the indication of the entity, or the application data with at least a first portion of the output data format; and
sending the output data to the device.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques for providing a contextual voice user interface that enables a user to query a speech processing system with respect to the decisions made to answer the user'"'"'s command are described. The speech processing system may store speech processing pipeline data used to process a command. At some point after the system outputs content deemed responsive to the command, a user may speak an utterance corresponding to an inquiry with respect to the processing performed to respond to the command. For example, the user may state “why did you tell me that?” In response thereto, the speech processing system may determine the stored speech processing pipeline data used to respond to the command, and may generate output audio data that describes the data and computing decisions involved in determining the content deemed responsive to the command.
-
Citations
20 Claims
-
1. A computer-implemented method comprising:
- during a first time period at one or more remote devices;
receiving, from a device, first input audio data corresponding to a first utterance; generating an identifier; associating the identifier with the first utterance; performing speech recognition processing on the first input audio data to generate first text data; associating the first text data with the identifier; performing natural language processing on the first text data to determine a first intent corresponding to the first utterance; associating the first intent with the identifier; performing natural language processing on the first text data to determine at least a portion of the first text data that potentially corresponds to an entity; associating, with the identifier, the at least a portion of the first text data and an indication of the entity; determining an application associated with the first intent; associating application data representing the application with the identifier; sending, to a remote device associated with the application, a signal requesting content responsive to the first utterance; receiving, from the remote device, content data representing the content; and causing the device to emit the content data; and during a second time period subsequent to the first time period at the one or more remote devices; receiving, from the device, second input audio data corresponding to a second utterance; performing speech recognition processing on the second input audio data to generate second text data; performing natural language processing on the second text data to determine a second intent corresponding to the second utterance, the second intent being to determine an explanation for processing of the first utterance and to receive previous speech processing results corresponding to the first utterance; determining the identifier associated with the first utterance; determining, based on the identifier, at least one of the first text data, the first intent, the at least a portion of the first text data, the indication of the entity, or the application data; determining an output data format associated with the second intent; generating output data using the output data format, wherein the output data includes the first text data and at least one of the first intent, the indication of the entity, or the application data with at least a first portion of the output data format; and sending the output data to the device. - View Dependent Claims (2, 3)
- during a first time period at one or more remote devices;
-
4. A system comprising:
-
at least one processor; and at least one memory including instructions that, when executed by the at least one processor, cause the system to; perform natural language processing on input text data representative of input from a user device to determine an intent of a current user input; determine the intent is to receive an explanation of a previous output corresponding to a previous user input and receive previous speech processing results corresponding to the previous user input; determine an identifier associated with the previous user input; determine previous speech recognition results associated with the identifier; determine previous natural language processing results associated with the identifier; determine an output format associated with the intent; generate output data using the output format, wherein the output data includes a portion of the input text data and at least one of;
at least a portion of the previous speech recognition results or at least a portion of the previous natural language processing results; andsend the output data to a first device associated with the input text data. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer-implemented method comprising:
-
performing natural language processing on input text data representative of input from a user device to determine an intent of a current user input; determining the intent is to receive an explanation of a previous output corresponding to a previous user input and receive previous speech processing results corresponding to the previous user input; determining an identifier associated with the previous user input; determining previous speech recognition results associated with the identifier; determining previous natural language processing results associated with the identifier; determining an output format associated with the intent; generating output data using the output format, wherein the output data includes a portion of the input text data and at least one of;
at least a portion of the previous speech recognition results or at least a portion of the previous natural language processing results; andsending the output data to a first device associated with the input text data. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification