Voice application architecture

US 9,548,066 B2
Filed: 08/11/2014
Issued: 01/17/2017
Est. Priority Date: 08/11/2014
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

one or more server computers;

one or more server applications that have been selected by a user for execution on the one or more server computers, wherein the one or more server applications operate in conjunction with a speech interface device located in premises of the user to provide services for the user;

a speech processing component configured to receive, from the speech interface device, an audio signal that represents user speech, wherein the user speech expresses a user intent, the speech processing component being further configured to perform automatic speech recognition on the audio signal to identify the user speech and to perform natural language understanding on the user speech to determine the user intent; and

an intent router configured to perform acts comprising;

identifying a first server application of the one or more server applications corresponding to the user intent;

providing a first indication to the first server application to invoke an action corresponding to the user intent;

providing a second indication of the user intent to the speech interface device, wherein the speech interface device is responsive to the user intent to perform the action corresponding to the user intent;

receiving, at the one or more server computers, a confirmation from the speech interface device that at least one of (i) the speech interface device will perform the action in response to the user intent or (ii) the speech interface device has performed the action in response to the user intent; and

providing a third indication, based at least in part on receiving the confirmation, to the first server application to cancel responding to the user intent.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A voice-based system may comprise a local speech interface device and a remote control service. A user may interact with the system using speech to obtain services and perform functions. The system may allow a user to install applications to provide enhanced or customized functionality. Such applications may be installed on either the speech interface device or the control service. The control service receives user speech and determines user intent based on the speech. If an application installed on the control service can respond to the intent, that application is called. Otherwise, the intent is provided to the speech interface device which responds by invoking one of its applications to respond to the intent.

181 Citations

25 Claims

1. A system comprising:
- one or more server computers;
  
  one or more server applications that have been selected by a user for execution on the one or more server computers, wherein the one or more server applications operate in conjunction with a speech interface device located in premises of the user to provide services for the user;
  
  a speech processing component configured to receive, from the speech interface device, an audio signal that represents user speech, wherein the user speech expresses a user intent, the speech processing component being further configured to perform automatic speech recognition on the audio signal to identify the user speech and to perform natural language understanding on the user speech to determine the user intent; and
  
  an intent router configured to perform acts comprising;
  
  identifying a first server application of the one or more server applications corresponding to the user intent;
  
  providing a first indication to the first server application to invoke an action corresponding to the user intent;
  
  providing a second indication of the user intent to the speech interface device, wherein the speech interface device is responsive to the user intent to perform the action corresponding to the user intent;
  
  receiving, at the one or more server computers, a confirmation from the speech interface device that at least one of (i) the speech interface device will perform the action in response to the user intent or (ii) the speech interface device has performed the action in response to the user intent; and
  
  providing a third indication, based at least in part on receiving the confirmation, to the first server application to cancel responding to the user intent.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The system of claim 1, wherein:
    - the speech interface device has one or more device applications that are selected by the user for execution on the speech interface device; and
      
      the speech interface device is configured to (a) identify a device application of the one or more device applications corresponding to the user intent and (b) invoke the device application to perform the action.
  - 3. The system of claim 1, wherein:
    - the speech interface device has one or more device applications that are selected by the user for execution on the speech interface device; and
      
      the acts further comprise (a) identifying a device application of the one or more device applications corresponding to the user intent and (b) causing the device application to be invoked to perform the action.
  - 4. The system of claim 1, wherein:
    - the user intent is a first user intent;
      
      the speech interface device has one or more device applications that are selected by the user for execution on the speech interface device; and
      
      the acts further comprise (a) determining that at least one of the one or more device applications corresponds to a second user intent and (b) providing a fourth indication of the second user intent to the speech interface device.
  - 5. The system of claim 1, wherein:
    - the speech interface device has one or more device applications that are selected by the user for execution on the speech interface device; and
      
      the one or more server applications and the one or more device applications are obtained from a collection of available applications in response to one or more user requests.

6. A method, comprising:
- under control of one or more computing systems configured with executable instructions,receiving a first selection to install a first application on one or more server computers;
  
  receiving a second selection to install a second application on a speech interface device;
  
  receiving an audio signal that represents user speech from the speech interface device, wherein the user speech indicates an intent;
  
  performing natural language understanding on the user speech to determine the intent;
  
  invoking the first application to respond to the intent;
  
  invoking the second application to respond to the intent;
  
  receiving a confirmation that at least one of (i) the first application will perform an action based at least in part on the intent or (ii) the first application has performed the action based at least in part on the intent; and
  
  providing a first indication, based at least in part on the receiving the confirmation, to the second application to cancel responding to the intent.
- View Dependent Claims (7, 8, 9, 10, 11, 22, 23)
- - 7. The method of claim 6, further comprising providing a second indication of the intent to the speech interface device, wherein the speech interface device is responsive to the second indication of the intent to perform the action corresponding to the intent.
  - 8. The method of claim 6, further comprising providing a second indication of the intent to the speech interface device, wherein the speech interface device is responsive to the second indication of the intent to:
    - identify the second application corresponding to the intent, wherein the second application has been selected by the user for execution on the speech interface device; and
      
      invoke the second application to perform the action corresponding to the intent.
  - 9. The method of claim 6, further comprising:
    - identifying the second application corresponding to the intent, wherein the second application has been selected by the user for execution on the speech interface device; and
      
      causing the second application to respond to the intent.
  - 10. The method of claim 6, further comprising conducting natural language dialogs with the user to receive the user speech.
  - 11. The method of claim 6, further comprising determining that the second application is available on the speech interface device for responding to the intent;
    - andproviding a second indication of the intent to the speech interface device.
  - 22. The method of claim 6, wherein the audio signal is a first audio signal, wherein the user speech is user speech, and wherein the intent is a first intent, the method further comprising:
    - receiving a second audio signal that represents second user speech from the speech interface device, wherein the second user speech indicates a second intent;
      
      performing natural language understanding on the second audio signal to determine the second intent; and
      
      providing a second indication of the second intent to the speech interface device.
  - 23. The method of claim 22, further comprising determining that no application for responding to the second intent has been selected by the user for execution on the one or more server computers.

12. A processor-implemented method comprising:
- receiving a first selection to install a first application on one or more server computers;
  
  receiving a second selection to install a second application on a device;
  
  receiving, from the device, an audio signal that represents user speech;
  
  determining an intent corresponding to the user speech;
  
  identifying the first application corresponding to the intent;
  
  providing a first indication of the intent to the one or more server computers for invocation of the first application to respond to the intent;
  
  providing a second indication of the intent to the device for invocation of the second application to respond to the intent;
  
  receiving, at the one or more server computers, a confirmation from the device that at least one of (i) the device will perform an action in response to the intent or (ii) the device has performed the action in response to the intent; and
  
  providing a third indication, based at least in part on receiving the confirmation, to the first application to cancel responding to the intent.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 24)
- - 13. The processor-implemented method of claim 12, wherein the intent is a first intent, the processor-implemented method further comprising providing a fourth indication of a second intent to the device for invocation of a third application to respond to the second intent.
  - 14. The processor-implemented method of claim 12, wherein the device comprises a speech interface located in a home of the user.
  - 15. The processor-implemented method of claim 12, further comprising conducting natural language dialogs with the user through the device to determine the intent.
  - 16. The processor-implemented method of claim 12, wherein the first application is configured to respond to the intent by sending one or more instructions to the device.
  - 17. The processor-implemented method of claim 12, further comprising identifying the first application from multiple applications that have been selected by the user for execution on the one or more server computers.
  - 18. The processor-implemented method of claim 12, wherein the intent is a first intent, the processor-implemented method further comprising:
    - receiving a third selection to install a third application on the device;
      
      determining a second intent expressed by the user of the device;
      
      providing a fourth indication of the second intent to the device; and
      
      causing the device to invoke the third application to respond to the second intent.
  - 19. The processor-implemented method of claim 12, wherein the intent is a first intent, and wherein the device is responsive to a fourth indication of a second intent to identify a third application from multiple applications that have been installed by the user for execution on the device.
  - 20. The processor-implemented method of claim 12, further comprising determining that the second application is installed on the device for responding to the intent.
  - 21. The processor-implemented method of claim 12, wherein the user speech is user speech, and wherein the intent is a first intent, the processor-implemented method further comprising:
    - determining that second user speech corresponds to a second intent;
      
      determining that no server application for responding to the second intent has been selected by the user for execution on the one or more server computers; and
      
      providing a fourth indication of the second intent to the device.
  - 24. The processor-implemented method of claim 12, wherein the user speech is user speech, and wherein the intent is a first intent, the method further comprising:
    - receiving second user speech expressed by the user of the device;
      
      determining a second intent expressed by the user of the device;
      
      providing a fourth indication of the second intent to the device; and
      
      causing the device to invoke a third application to respond to the second intent, wherein the third application has been selected by the user for execution on the device.

25. A processor-implemented method comprising:
- receiving a first selection to install a first application on one or more server computers;
  
  receiving a second selection to install a second application on a device;
  
  receiving, from the device, an audio signal representing user speech;
  
  determining an intent corresponding to the user speech;
  
  identifying the first application corresponding to the intent;
  
  providing a first indication of the intent to the one or more server computers for invocation of the first application to respond to the first intent;
  
  providing a second indication of the intent to the device for invocation of a second application to respond to the first intent;
  
  receiving a confirmation from the first application that at least one of (i) the first application will perform an action in response to the intent or (ii) the first application has performed the action in response to the intent; and
  
  providing a third indication, based at least in part on receiving the confirmation, to the device to cancel responding to the intent.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Jain, Vikas, Mutagi, Rohan, Carbon, Peter Paul Henri
Primary Examiner(s)
Hang, Vu B

Application Number

US14/456,620
Publication Number

US 20160042748A1
Time in Patent Office

890 Days
Field of Search

709/9, 704/9, 704/231, 704/251, 704/257, 704/270, 704/275
US Class Current

1/1
CPC Class Codes

G06F 40/253   Grammatical analysis; Style...

G06F 40/30   Semantic analysis

G06F 40/40   Processing or translation o...

G10L 15/1822   Parsing for meaning underst...

G10L 15/22   Procedures used during a sp...

G10L 15/26   Speech to text systems G10L...

G10L 15/30   Distributed recognition, e....

G10L 2015/223   Execution procedure of a sp...

G10L 25/48   specially adapted for parti...

Voice application architecture

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

181 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Voice application architecture

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

181 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links