Managing dialogs on a speech recognition platform

US 10,026,394 B1
Filed: 03/15/2013
Issued: 07/17/2018
Est. Priority Date: 08/31/2012
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

one or more processors; and

one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;

receiving a first audio signal that represents, at least in part, first speech;

analyzing the first audio signal to generate first speech-recognition results;

identifying a first intent and a second intent associated with the first speech based at least in part on the first speech-recognition results, wherein the first intent and the second intent are associated with a first domain;

identifying a third intent associated with the first speech based at least in part on the first speech-recognition results, wherein the third intent is associated with a second domain;

sending a second audio signal that represents a first question associated with the first domain and the second domain;

receiving a third audio signal that represents, at least in part, second speech;

analyzing the third audio signal to generate second speech-recognition results;

selecting the first domain based at least in part on the second speech-recognition results;

sending a fourth audio signal that represents a second question associated with at least the first intent and the second intent;

receiving a fifth audio signal that represents, at least in part, third speech;

analyzing the fifth audio signal to generate third speech-recognition results; and

selecting the first intent based at least in part on the third speech-recognition results.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition platform configured to receive an audio signal that includes speech from a user and perform automatic speech recognition (ASR) on the audio signal to identify ASR results. The platform may identify: (i) a domain of a voice command within the speech based on the ASR results and based on context information associated with the speech or the user, and (ii) an intent of the voice command. In response to identifying the intent, the platform may perform a corresponding action, such as streaming audio to the device, setting a reminder for the user, purchasing an item on behalf of the user, making a reservation for the user or launching an application for the user. In some instances, the speech recognition platform engages in a back-and-forth dialog with the user in order to properly fulfill the user'"'"'s request.

60 Citations

View as Search Results

22 Claims

1. A system comprising:
- one or more processors; and
  
  one or more computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;
  
  receiving a first audio signal that represents, at least in part, first speech;
  
  analyzing the first audio signal to generate first speech-recognition results;
  
  identifying a first intent and a second intent associated with the first speech based at least in part on the first speech-recognition results, wherein the first intent and the second intent are associated with a first domain;
  
  identifying a third intent associated with the first speech based at least in part on the first speech-recognition results, wherein the third intent is associated with a second domain;
  
  sending a second audio signal that represents a first question associated with the first domain and the second domain;
  
  receiving a third audio signal that represents, at least in part, second speech;
  
  analyzing the third audio signal to generate second speech-recognition results;
  
  selecting the first domain based at least in part on the second speech-recognition results;
  
  sending a fourth audio signal that represents a second question associated with at least the first intent and the second intent;
  
  receiving a fifth audio signal that represents, at least in part, third speech;
  
  analyzing the fifth audio signal to generate third speech-recognition results; and
  
  selecting the first intent based at least in part on the third speech-recognition results.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. A system as recited in claim 1, the acts further comprising:
    - identifying a value of a field of the first intent based at least in part on the third speech-recognition results; and
      
      associating the value with the field of the first intent.
  - 3. A system as recited in claim 2, the acts further comprising:
    - at least partly in response to determining the value of the field, determining that the first intent is actionable and performing a corresponding action associated with the first intent.
  - 4. A system as recited in claim 3, wherein the action comprises one or more of setting a reminder, playing a media file, adding an item to a list, providing a recommendation, ordering or purchasing an item, or placing a telephone call.
  - 5. A system as recited in claim 1, the acts further comprising:
    - determining that at least a field associated with the first intent has a corresponding value;
      
      determining that the first intent is actionable;
      
      generating a sixth audio signal for output, the sixth audio signal including an indication that the action will be performed; and
      
      performing the action.
  - 6. A system as recited in claim 1, the acts further comprising:
    - determining that a first confidence associated with the first domain and a second confident associated with the second domain are below a first threshold, wherein sending the second audio signal is based at least in part on the first confidence and the second confident being below the first threshold; and
      
      determining that a third confidence associated with the first intent and a fourth confidence associated the second intent are below a second threshold, wherein sending the fourth audio signal is based at least in part on the third confidence and the fourth confidence being below the second threshold.

7. One or more computing devices comprising:
- one or more processors; and
  
  memory storing computer-readable instructions that, when executed by the one or more processors, cause the one or more processors to;
  
  receive a first audio signal representing first user speech;
  
  identify a first intent associated with a first activity of a first set of activities based at least in part on the first audio signal;
  
  identify a second intent associated with a second activity of the first set of activities based at least in part on the first audio signal;
  
  identifying a third intent associated with a third activity of a second set of activities based at least in part on the first audio signal;
  
  send a second audio signal representing a first question associated with the first set of activities and the second set of activities;
  
  receive a third audio signal representing second user speech;
  
  select the first set of activities based at least in part on the third audio signal;
  
  send, based at least in part on the first activity and the second activity, a fourth audio signal representing a second question for at least one additional piece of information;
  
  receive a fifth audio signal representing third user speech; and
  
  select the first activity from the first set of activities based at least in part on the fifth audio signal.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. One or more computing devices as recited in claim 7, wherein the dialog component is further executable to:
    - determine, based at least in part on the first set of activities, that the first activity cannot be performed without sending the fourth audio signal.
  - 9. One or more computing devices as recited in claim 7, wherein the one or more computing devices form a portion of a remote network-accessible computing platform.
  - 10. One or more computing devices as recited in claim 7, wherein sending the second audio signal is based at least in part on determining that the first intent does not include information sufficient for performing the first activity from the first set of activities.
  - 11. One or more computing devices as recited in claim 7, wherein the first activity comprises one or more of a request to set a reminder, a request to play a media file, a request to add an item to a list, a request to provide a recommendation, a request to purchase an item, or a request to place a telephone call.
  - 12. One or more computing devices as recited in claim 7, the memory further storing computer-readable instructions that, when executed by the one or more processors, cause the one or more processors to:
    - identify a dialog associated with the first activity; and
      
      analyze the dialog to identify an additional piece of information needed to complete the first activity.

13. A method comprising:
- receiving a first audio signal representing first user speech;
  
  identifying, using the first audio signal, a first activity and a second activity associated with a first domain;
  
  identifying, using the first audio signal, a third activity associated with a second domain;
  
  sending a second audio signal representing a first question associated with the first domain and the second domain;
  
  receiving a third audio signal representing second user speech;
  
  selecting the first domain based at least in part on the third audio signal;
  
  determining, based at least in part on the first activity and the second activity, to request at least one additional piece of information;
  
  based at least in part on determining to request the at least one additional piece of information, sending a fourth audio signal representing a second question for the at least one additional piece of information;
  
  receiving a fifth audio signal representing third user speech; and
  
  selecting the first activity based at least in part on the fifth audio signal.
- View Dependent Claims (14, 15, 16, 17)
- - 14. A method as recited in claim 13, further comprising:
    - determining, based at least in part on the first domain, that the first activity cannot be performed without sending the second audio signal.
  - 15. A method as recited in claim 13, wherein sending the second audio signal is based at least in part on determining that the first audio signal does not include information sufficient for performing the first activity.
  - 16. A method as recited in claim 13, wherein the first audio signal comprises one or more of a request to set a reminder, a request to play a media file, a request to add an item to a list, a request to provide a recommendation, a request to purchase an item, or a request to place a telephone call.
  - 17. A method as recited in claim 13, further comprising:
    - determining that a first confidence associated with the first domain and a second confident associated with the second domain are below a first threshold, wherein sending the second audio signal is based at least in part on the first confidence and the second confident being below the first threshold; and
      
      determining that a third confidence associated with the first activity and a fourth confidence associated the second activity are below a second threshold, wherein sending the fourth audio signal is based at least in part on the third confidence and the fourth confidence being below the second threshold.

18. One or more non-transitory computer-readable media storing computer-executable instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising:
- receiving a first audio signal representing first user speech;
  
  identifying, using the first audio signal, a first task and a second task associated with a first domain;
  
  identifying, using the first audio signal, a third task associated with a second domain;
  
  sending a second audio signal representing a first question associated with the first domain and the second domain;
  
  receiving a third audio signal representing second user speech;
  
  selecting the first domain based at least in part on the third audio signal;
  
  determining, based at least in part on the first task and the second task, at least one additional piece of information to request;
  
  based at least in part on determining the at least one additional piece of information, sending a fourth audio signal representing a second question for the at least one additional piece of information;
  
  receiving a fifth audio signal representing third user speech; and
  
  selecting the first task based at least in part on the fifth audio signal.
- View Dependent Claims (19, 20, 21, 22)
- - 19. One or more non-transitory computer-readable media as recited in claim 18, further comprising:
    - determining, based at least in part on the first domain, that the first task cannot be performed without requesting the at least one additional piece of information.
  - 20. One or more non-transitory computer-readable media as recited in claim 18, wherein the second audio signal is configured to be output by a speaker of an electronic device.
  - 21. One or more non-transitory computer-readable media as recited in claim 18, wherein sending the second audio signal comprises determining that the first audio signal does not include information sufficient for performing at least the first task.
  - 22. One or more non-transitory computer-readable media as recited in claim 18, wherein the first audio signal comprises one or more of a request to set a reminder, a request to play a media file, a request to add an item to a list, a request to provide a recommendation, a request to purchase an item, or a request to place a telephone call.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Carbon, Peter Paul Henri, Gundeti, Vikram Kumar, Deramat, Frederic Johan Georges, Gopalakrishnan, Ajay, Thimsen, John Daniel
Primary Examiner(s)
Lerner, Martin

Application Number

US13/843,392
Time in Patent Office

1,950 Days
Field of Search

704257, 704270, 7042701, 704275
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 15/00   Speech recognition G10L17/0...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/22   Procedures used during a sp...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/225   Feedback of the input speech

G10L 21/06   Transformation of speech in...

Managing dialogs on a speech recognition platform

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

60 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Managing dialogs on a speech recognition platform

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

60 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links