Managing dialogs on a speech recognition platform
First Claim
1. A system comprising:
- one or more processors; and
one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;
receiving a first audio signal that represents, at least in part, first speech;
analyzing the first audio signal to generate first speech-recognition results;
identifying a first intent and a second intent associated with the first speech based at least in part on the first speech-recognition results, wherein the first intent and the second intent are associated with a first domain;
identifying a third intent associated with the first speech based at least in part on the first speech-recognition results, wherein the third intent is associated with a second domain;
sending a second audio signal that represents a first question associated with the first domain and the second domain;
receiving a third audio signal that represents, at least in part, second speech;
analyzing the third audio signal to generate second speech-recognition results;
selecting the first domain based at least in part on the second speech-recognition results;
sending a fourth audio signal that represents a second question associated with at least the first intent and the second intent;
receiving a fifth audio signal that represents, at least in part, third speech;
analyzing the fifth audio signal to generate third speech-recognition results; and
selecting the first intent based at least in part on the third speech-recognition results.
2 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition platform configured to receive an audio signal that includes speech from a user and perform automatic speech recognition (ASR) on the audio signal to identify ASR results. The platform may identify: (i) a domain of a voice command within the speech based on the ASR results and based on context information associated with the speech or the user, and (ii) an intent of the voice command. In response to identifying the intent, the platform may perform a corresponding action, such as streaming audio to the device, setting a reminder for the user, purchasing an item on behalf of the user, making a reservation for the user or launching an application for the user. In some instances, the speech recognition platform engages in a back-and-forth dialog with the user in order to properly fulfill the user'"'"'s request.
60 Citations
22 Claims
-
1. A system comprising:
-
one or more processors; and one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising; receiving a first audio signal that represents, at least in part, first speech; analyzing the first audio signal to generate first speech-recognition results; identifying a first intent and a second intent associated with the first speech based at least in part on the first speech-recognition results, wherein the first intent and the second intent are associated with a first domain; identifying a third intent associated with the first speech based at least in part on the first speech-recognition results, wherein the third intent is associated with a second domain; sending a second audio signal that represents a first question associated with the first domain and the second domain; receiving a third audio signal that represents, at least in part, second speech; analyzing the third audio signal to generate second speech-recognition results; selecting the first domain based at least in part on the second speech-recognition results; sending a fourth audio signal that represents a second question associated with at least the first intent and the second intent; receiving a fifth audio signal that represents, at least in part, third speech; analyzing the fifth audio signal to generate third speech-recognition results; and selecting the first intent based at least in part on the third speech-recognition results. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. One or more computing devices comprising:
-
one or more processors; and memory storing computer-readable instructions that, when executed by the one or more processors, cause the one or more processors to; receive a first audio signal representing first user speech; identify a first intent associated with a first activity of a first set of activities based at least in part on the first audio signal; identify a second intent associated with a second activity of the first set of activities based at least in part on the first audio signal; identifying a third intent associated with a third activity of a second set of activities based at least in part on the first audio signal; send a second audio signal representing a first question associated with the first set of activities and the second set of activities; receive a third audio signal representing second user speech; select the first set of activities based at least in part on the third audio signal; send, based at least in part on the first activity and the second activity, a fourth audio signal representing a second question for at least one additional piece of information; receive a fifth audio signal representing third user speech; and select the first activity from the first set of activities based at least in part on the fifth audio signal. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A method comprising:
-
receiving a first audio signal representing first user speech; identifying, using the first audio signal, a first activity and a second activity associated with a first domain; identifying, using the first audio signal, a third activity associated with a second domain; sending a second audio signal representing a first question associated with the first domain and the second domain; receiving a third audio signal representing second user speech; selecting the first domain based at least in part on the third audio signal; determining, based at least in part on the first activity and the second activity, to request at least one additional piece of information; based at least in part on determining to request the at least one additional piece of information, sending a fourth audio signal representing a second question for the at least one additional piece of information; receiving a fifth audio signal representing third user speech; and selecting the first activity based at least in part on the fifth audio signal. - View Dependent Claims (14, 15, 16, 17)
-
-
18. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising:
-
receiving a first audio signal representing first user speech; identifying, using the first audio signal, a first task and a second task associated with a first domain; identifying, using the first audio signal, a third task associated with a second domain; sending a second audio signal representing a first question associated with the first domain and the second domain; receiving a third audio signal representing second user speech; selecting the first domain based at least in part on the third audio signal; determining, based at least in part on the first task and the second task, at least one additional piece of information to request; based at least in part on determining the at least one additional piece of information, sending a fourth audio signal representing a second question for the at least one additional piece of information; receiving a fifth audio signal representing third user speech; and selecting the first task based at least in part on the fifth audio signal. - View Dependent Claims (19, 20, 21, 22)
-
Specification