Voice application architecture
First Claim
1. A system comprising:
- one or more server computers;
one or more server applications that have been selected by a user for execution on the one or more server computers, wherein the one or more server applications operate in conjunction with a speech interface device located in premises of the user to provide services for the user;
a speech processing component configured to receive, from the speech interface device, an audio signal that represents user speech, wherein the user speech expresses a user intent, the speech processing component being further configured to perform automatic speech recognition on the audio signal to identify the user speech and to perform natural language understanding on the user speech to determine the user intent; and
an intent router configured to perform acts comprising;
identifying a first server application of the one or more server applications corresponding to the user intent;
providing a first indication to the first server application to invoke an action corresponding to the user intent;
providing a second indication of the user intent to the speech interface device, wherein the speech interface device is responsive to the user intent to perform the action corresponding to the user intent;
receiving, at the one or more server computers, a confirmation from the speech interface device that at least one of (i) the speech interface device will perform the action in response to the user intent or (ii) the speech interface device has performed the action in response to the user intent; and
providing a third indication, based at least in part on receiving the confirmation, to the first server application to cancel responding to the user intent.
4 Assignments
0 Petitions
Accused Products
Abstract
A voice-based system may comprise a local speech interface device and a remote control service. A user may interact with the system using speech to obtain services and perform functions. The system may allow a user to install applications to provide enhanced or customized functionality. Such applications may be installed on either the speech interface device or the control service. The control service receives user speech and determines user intent based on the speech. If an application installed on the control service can respond to the intent, that application is called. Otherwise, the intent is provided to the speech interface device which responds by invoking one of its applications to respond to the intent.
181 Citations
25 Claims
-
1. A system comprising:
-
one or more server computers; one or more server applications that have been selected by a user for execution on the one or more server computers, wherein the one or more server applications operate in conjunction with a speech interface device located in premises of the user to provide services for the user; a speech processing component configured to receive, from the speech interface device, an audio signal that represents user speech, wherein the user speech expresses a user intent, the speech processing component being further configured to perform automatic speech recognition on the audio signal to identify the user speech and to perform natural language understanding on the user speech to determine the user intent; and an intent router configured to perform acts comprising; identifying a first server application of the one or more server applications corresponding to the user intent; providing a first indication to the first server application to invoke an action corresponding to the user intent; providing a second indication of the user intent to the speech interface device, wherein the speech interface device is responsive to the user intent to perform the action corresponding to the user intent; receiving, at the one or more server computers, a confirmation from the speech interface device that at least one of (i) the speech interface device will perform the action in response to the user intent or (ii) the speech interface device has performed the action in response to the user intent; and providing a third indication, based at least in part on receiving the confirmation, to the first server application to cancel responding to the user intent. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method, comprising:
under control of one or more computing systems configured with executable instructions, receiving a first selection to install a first application on one or more server computers; receiving a second selection to install a second application on a speech interface device; receiving an audio signal that represents user speech from the speech interface device, wherein the user speech indicates an intent; performing natural language understanding on the user speech to determine the intent; invoking the first application to respond to the intent; invoking the second application to respond to the intent; receiving a confirmation that at least one of (i) the first application will perform an action based at least in part on the intent or (ii) the first application has performed the action based at least in part on the intent; and providing a first indication, based at least in part on the receiving the confirmation, to the second application to cancel responding to the intent. - View Dependent Claims (7, 8, 9, 10, 11, 22, 23)
-
12. A processor-implemented method comprising:
-
receiving a first selection to install a first application on one or more server computers; receiving a second selection to install a second application on a device; receiving, from the device, an audio signal that represents user speech; determining an intent corresponding to the user speech; identifying the first application corresponding to the intent; providing a first indication of the intent to the one or more server computers for invocation of the first application to respond to the intent; providing a second indication of the intent to the device for invocation of the second application to respond to the intent; receiving, at the one or more server computers, a confirmation from the device that at least one of (i) the device will perform an action in response to the intent or (ii) the device has performed the action in response to the intent; and providing a third indication, based at least in part on receiving the confirmation, to the first application to cancel responding to the intent. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 24)
-
-
25. A processor-implemented method comprising:
-
receiving a first selection to install a first application on one or more server computers; receiving a second selection to install a second application on a device; receiving, from the device, an audio signal representing user speech; determining an intent corresponding to the user speech; identifying the first application corresponding to the intent; providing a first indication of the intent to the one or more server computers for invocation of the first application to respond to the first intent; providing a second indication of the intent to the device for invocation of a second application to respond to the first intent; receiving a confirmation from the first application that at least one of (i) the first application will perform an action in response to the intent or (ii) the first application has performed the action in response to the intent; and providing a third indication, based at least in part on receiving the confirmation, to the device to cancel responding to the intent.
-
Specification