Outcome-oriented dialogs on a speech recognition platform
First Claim
1. A system comprising:
- one or more processors; and
one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;
receiving an audio signal that includes speech of a user;
analyzing the speech to generate speech-recognition results;
selecting an intent associated with the speech based at least in part on the speech-recognition results, the intent representing multiple possible tasks that the user may have requested in the speech;
identifying the multiple possible tasks represented by the intent;
identifying, for at least one of the multiple possible tasks;
(1) one or more fields that, when filled with respective values, results in the respective possible task being actionable, and (2) which of the one or more fields have values determined from the speech-recognition results;
selecting a target task to perform from the multiple possible tasks;
if a threshold number of the one or more fields have values, performing the target task;
if the threshold number of the one or more fields do not have values, performing an action from a plurality of actions to obtain one or more respective values for one or more fields that do not have values the action selected with reference to (1) a cost associated with the action, the cost being greater if the action is the first action that involves interacting with the user than if the action is the second action that is free from interacting with the user, and (2) a probability that performing the action will result in obtaining the one or more respective values.
2 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition platform configured to receive an audio signal that includes speech from a user and perform automatic speech recognition (ASR) on the audio signal to identify ASR results. The platform may identify: (i) a domain of a voice command within the speech based on the ASR results and based on context information associated with the speech or the user, and (ii) an intent of the voice command. In response to identifying the intent, the platform may perform multiple actions corresponding to this intent. The platform may select a target action to perform, and may engage in a back-and-forth dialog to obtain information for completing the target action. The action may include streaming audio to the device, setting a reminder for the user, purchasing an item on behalf of the user, making a reservation for the user or launching an application for the user.
297 Citations
23 Claims
-
1. A system comprising:
-
one or more processors; and one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising; receiving an audio signal that includes speech of a user; analyzing the speech to generate speech-recognition results; selecting an intent associated with the speech based at least in part on the speech-recognition results, the intent representing multiple possible tasks that the user may have requested in the speech; identifying the multiple possible tasks represented by the intent; identifying, for at least one of the multiple possible tasks;
(1) one or more fields that, when filled with respective values, results in the respective possible task being actionable, and (2) which of the one or more fields have values determined from the speech-recognition results;selecting a target task to perform from the multiple possible tasks; if a threshold number of the one or more fields have values, performing the target task; if the threshold number of the one or more fields do not have values, performing an action from a plurality of actions to obtain one or more respective values for one or more fields that do not have values the action selected with reference to (1) a cost associated with the action, the cost being greater if the action is the first action that involves interacting with the user than if the action is the second action that is free from interacting with the user, and (2) a probability that performing the action will result in obtaining the one or more respective values. - View Dependent Claims (2, 3, 4, 5)
-
-
6. One or more computing devices comprising:
-
one or more processors; memory; and a dialog component, stored in the memory and executable on the one or more processors to perform acts comprising; receiving an indication of a request made by a user, the request identified from an audio signal including speech of the user; identifying a possible task to perform in response to receiving the request; identifying, for the possible task;
(1) one or more fields that, when filled with respective values, results in the possible task being actionable, and (2) which of the one or more fields have values determined from the speech of the user;identifying multiple possible actions, including a first action that involves to perform to obtain one or more values for fields that do not have values; selecting an action to perform from the multiple possible actions, the selecting based at least in part on a probability that performing the action will result in obtaining the one or more values for the fields that do not have values; and performing the action based at least in part on the probability being above a threshold value. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A method comprising:
-
receiving an audio signal that includes speech of a user; identifying, from the audio signal, a request from the user; identifying a task to perform in response to the request, the task being associated with one or more fields that, when filled with respective values, results in the task being actionable; determining which of the one or more fields is associated with a respective value; determining, for multiple possible actions to perform, which of the one or more fields each respective possible action is intended to fill; selecting an action to perform from the multiple possible actions, the selecting based at least in part on a probability that performing the action will result in obtaining values for fields that do not have values; and performing the action based at least in part on the probability. - View Dependent Claims (21, 22, 23)
-
Specification