Outcome-oriented dialogs on a speech recognition platform
First Claim
1. A system comprising:
- one or more processors; and
one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising;
receiving first audio data representing a request;
determining, based at least in part on the first audio data, first intent data representing a first intent associated with the request, the first intent associated with one or more slots;
determining, based at least in part on the first audio data, second intent data representing a second intent associated with the request;
determining a first number of values associated with the first intent;
determining a second number of values associated with the second intent;
selecting the first intent based at least in part on the first number of values being more favorable than the second number of values;
determining that a slot of the one or more slots is unfilled;
generating, based at least in part on the slot being unfilled, second audio data representing a query for additional information;
sending the second audio data to a device to output audio corresponding to the second audio data;
receiving third audio data representing a response to the query; and
associating a value with the slot based at least in part on the third audio data.
2 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition platform configured to receive an audio signal that includes speech from a user and perform automatic speech recognition (ASR) on the audio signal to identify ASR results. The platform may identify: (i) a domain of a voice command within the speech based on the ASR results and based on context information associated with the speech or the user, and (ii) an intent of the voice command. In response to identifying the intent, the platform may perform multiple actions corresponding to this intent. The platform may select a target action to perform, and may engage in a back-and-forth dialog to obtain information for completing the target action. The action may include streaming audio to the device, setting a reminder for the user, purchasing an item on behalf of the user, making a reservation for the user or launching an application for the user.
-
Citations
20 Claims
-
1. A system comprising:
-
one or more processors; and one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising; receiving first audio data representing a request; determining, based at least in part on the first audio data, first intent data representing a first intent associated with the request, the first intent associated with one or more slots; determining, based at least in part on the first audio data, second intent data representing a second intent associated with the request; determining a first number of values associated with the first intent; determining a second number of values associated with the second intent; selecting the first intent based at least in part on the first number of values being more favorable than the second number of values; determining that a slot of the one or more slots is unfilled; generating, based at least in part on the slot being unfilled, second audio data representing a query for additional information; sending the second audio data to a device to output audio corresponding to the second audio data; receiving third audio data representing a response to the query; and associating a value with the slot based at least in part on the third audio data. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method comprising:
-
receiving first audio data representing a request; determining, based at least in part on the first audio data, first intent data representing a first intent associated with the request, the first intent associated with one or more slots; determining, based at least in part on the first audio data, second intent data representing a second intent associated with the request; determining a first number of values associated with the first intent; determining a second number of values associated with the second intent; selecting the first intent based at least in part on the first number of values being more favorable than the second number of values; determining that a slot of the one or more slots is unfilled; generating, based at least in part on the slot being unfilled, second audio data representing a query for additional information; sending the second audio data to a device to output audio corresponding to the second audio data; receiving third audio data representing a response to the query; and associating a value with the slot based at least in part on the third audio data. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system comprising:
-
one or more processors; and one or more computer-readable media including instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising; receiving first audio data representing a request; determining, based at least in part on the first audio data and a context associated with the request, first intent data representing a first intent associated with the request, the first intent associated with one or more slots; determining, based at least in part on the first audio data and the context associated with the request, second intent data representing a second intent associated with the request; determining a first number of values associated with the first intent; determining a second number of values associated with the second intent; selecting the first intent based at least in part on the first number of values being more favorable than the second number of values; determining that a slot of the one or more slots is unfilled; generating, based at least in part on the slot being unfilled, second audio data representing a query for additional information; sending the second audio data to a device to output audio corresponding to the second audio data; receiving third audio data representing a response to the query; and associating a value with the slot based at least in part on the third audio data. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification