Speech recognition platforms

US 9,424,840 B1
Filed: 03/15/2013
Issued: 08/23/2016
Est. Priority Date: 08/31/2012
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

a coordination component configured to receive an audio signal generated based at least in part on sound from an environment, the sound captured by a device that includes a microphone unit and a speaker and the audio signal representing speech of a user in the environment;

a speech recognition component configured to perform speech recognition on the audio signal to generate a speech-recognition result;

a natural language understanding (NLU) component configured to;

provide the speech-recognition result to a first domain and a second domain;

identify one or more named entities within the speech-recognition result for each of the first domain and the second domain;

fill one or more slots within the speech-recognition result for each of the first domain and the second domain based at least in part on context information associated with the speech of the user;

identify, for the first domain, multiple first intents associated with the speech of the user based at least in part on the one or more named entities and the one or more slots filled for the first domain;

identifying, for the second domain, multiple second intents associated with the speech of the user based at least in part on the one or more named entities and the one or more slots filled for the second domain; and

rank the multiple first intents and the multiple second intents relative to one another;

a dialog component configured to;

select one of the first domain or the second domain based at least in part on the ranking; and

select one of the multiple first intents or one of the multiple second intents that is associated with the selected domain; and

a response component configured to receive an indication of the selected domain and an indication of the selected intent and cause performance of an action corresponding to the selected intent, the action including providing an audio signal for output on the speaker of the device.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition platform configured to receive an audio signal that includes speech from a user and perform automatic speech recognition (ASR) on the audio signal to identify ASR results. The platform may identify: (i) a domain of a voice command within the speech based on the ASR results and based on context information associated with the speech or the user, and (ii) an intent of the voice command. In response to identifying the intent, the platform may perform a corresponding action, such as streaming audio to the device, setting a reminder for the user, purchasing an item on behalf of the user, making a reservation for the user or launching an application for the user. The speech recognition platform, in combination with the device, may therefore facilitate efficient interactions between the user and a voice-controlled device.

278 Citations

23 Claims

1. A system comprising:
- a coordination component configured to receive an audio signal generated based at least in part on sound from an environment, the sound captured by a device that includes a microphone unit and a speaker and the audio signal representing speech of a user in the environment;
  
  a speech recognition component configured to perform speech recognition on the audio signal to generate a speech-recognition result;
  
  a natural language understanding (NLU) component configured to;
  
  provide the speech-recognition result to a first domain and a second domain;
  
  identify one or more named entities within the speech-recognition result for each of the first domain and the second domain;
  
  fill one or more slots within the speech-recognition result for each of the first domain and the second domain based at least in part on context information associated with the speech of the user;
  
  identify, for the first domain, multiple first intents associated with the speech of the user based at least in part on the one or more named entities and the one or more slots filled for the first domain;
  
  identifying, for the second domain, multiple second intents associated with the speech of the user based at least in part on the one or more named entities and the one or more slots filled for the second domain; and
  
  rank the multiple first intents and the multiple second intents relative to one another;
  
  a dialog component configured to;
  
  select one of the first domain or the second domain based at least in part on the ranking; and
  
  select one of the multiple first intents or one of the multiple second intents that is associated with the selected domain; and
  
  a response component configured to receive an indication of the selected domain and an indication of the selected intent and cause performance of an action corresponding to the selected intent, the action including providing an audio signal for output on the speaker of the device.
- View Dependent Claims (2, 3, 4, 5)
- - 2. A system as recited in claim 1, wherein at least one of coordination component or the speech recognition component is further configured to retrieve the context information associated with the speech and provide the context information to the NLU component to fill one or more slots within the speech-recognition result for individual ones of the first domain and the second domain.
  - 3. A system as recited in claim 1, wherein each of the first domain and the second domain specify a set of related activities that the user may request the device to perform, the device configured to perform activities of the set of related activities based at least in part on receiving a command identified from speech of the user.
  - 4. A system as recited in claim 1, wherein each of the multiple first intents and the multiple second intents are associated with one or more fields that, when associated with respective values, specify an action requested by the user, and the NLU component is configured to associated respective values to the one or more fields based at least in part on the context information associated with the speech of the user.
  - 5. A system as recited in claim 1, wherein:
    - the dialog component is further configured to engage in a dialog with the user by causing the speaker to output a first set of one or more questions, and wherein the dialog component is further configured to select the first domain or the second domain based at least in part a response from the user to the first set of one or more questions; and
      
      the response component is further configured to engage in another dialog with the user by causing the speaker to output a second set of one or more questions, and wherein the dialog component is further configured to select one of the multiple first intents or one of the multiple second intents based at least in part a response from the user to the set of one or more questions.

6. A system comprising:
- one or more processors; and
  
  computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;
  
  receiving an audio signal that represents speech from a user, the speech of the user received at a device that includes a speaker;
  
  retrieving context information associated with the speech, the context information based at least in part on a previous interaction between the user and the device;
  
  performing speech recognition on the audio signal to generate speech-recognition results;
  
  provide the speech-recognition results to a first domain and a second domain;
  
  identify one or more named entities within the speech-recognition results for each of the first domain and the second domain;
  
  fill one or more slots within the speech-recognition results for each of the first domain and the second domain based at least in part on the context information;
  
  identifying multiple first intents, for the first domain, associated with the speech based at least in part on the one or more named entities and one or more slots filled for the first domain;
  
  identifying, for the second domain, multiple second intents associated with the speech of the user based at least in part on the one or more named entities and the one or more slots filled for the second domain;
  
  selecting one of the first domain or the second domain based at least in part on the multiple first intents and the multiple second intents;
  
  selecting, from the multiple first intents or multiple second intents, an intent that is associated with the selected domain; and
  
  providing an audio signal for output on the speaker of the device based at least in part on the selected intent and the selected domain.
- View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
- - 7. A system as recited in claim 6, wherein each of the first domain and the second domain specify set of related activities that the user may request the device to perform, the device configured to perform activities of the set of related activities based at least in part on receiving a command identified from speech of the user.
  - 8. A system as recited in claim 7, wherein each of the first domain is associated with multiple first intents and the second domain is associated with multiple second intents, and wherein each of the intents of a respective domain is associated with a particular activity of the set of activities.
  - 9. A system as recited in claim 6, wherein the identifying comprises, for each of the domains:
    - parsing the speech-recognition results to recognize one or more named entities associated with the domain;
      
      associating a value with a field associated with the domain using the context information; and
      
      identifying a respective intent from the domain based at least in part on the one or more recognized named entities and the value of the field.
  - 10. A system as recited in claim 6, the acts further comprising ranking the multiple first intents and the multiple second intents prior to selecting one of the first domain or the second domain.
  - 11. A system as recited in claim 6, wherein selecting one of the first domain or the second domain comprises causing the speaker to output a question and receiving a response from the user to the question.
  - 12. A system as recited in claim 6, wherein selecting the intent comprises causing the speaker to output a question and receiving a response from the user to the question.
  - 13. A system as recited in claim 6, the acts further comprising at least one of:
    - streaming audio to the device, setting a reminder for the user, ordering or purchasing an item on behalf of the user, making a reservation for the user, or launching an application for a user.

14. A method comprising:
- under control of one or more computing systems configured with executable instructions,receiving an audio signal generated by a device, the audio signal representing speech of a user;
  
  performing speech recognition on the speech to generate speech-recognition results;
  
  provide the speech-recognition results to a first domain and a second domain;
  
  identify one or more named entities within the speech-recognition results for each of the first domain and the second domain;
  
  fill one or more slots within the speech-recognition results for each of the first domain and the second domain based at least in part on context information associated with the speech of the user;
  
  identifying, based at least in part on the one or more named entities and one or more slots filled for the first domain, a first potential intent of the speech, the first potential intent being associated with the first domain;
  
  identifying, based at least in part on the one or more named entities and one or more slots filled for the second domain, a second potential intent of the speech, the second potential intent being associated with a second domain;
  
  selecting the first potential intent or the second potential intent; and
  
  providing an audio signal for output on the device based at least in part on the selecting.
- View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 15. A method as recited in claim 14, wherein the one or more computing systems form a portion of a network-accessible computing platform that is remote from an environment in which the device resides.
  - 16. A method as recited in claim 14, wherein the first or second potential intents are selected based at least in part on a dialog with the user that occurs at least partly subsequent to receiving the audio signal.
  - 17. A method as recited in claim 14, further comprising obtaining context information associated with the speech or with the user at least partly in response to receiving the audio signal, and wherein the first and second potential intents are identified based at least in part on the context information.
  - 18. A method as recited in claim 17, wherein the context information is based at least in part on previous speech of the user.
  - 19. A method as recited in claim 17, wherein the context information is based at least in part on a location of the user, preferences of the user, or information from an application called by the speech of the user.
  - 20. A method as recited in claim 14, further comprising selecting the first domain or the second domain prior to selecting the first potential intent or the second potential intent.
  - 21. A method as recited in claim 20, wherein the first domain or the second domain is selected based at least in part on a dialog with the user that occurs at least partly subsequent to receiving the audio signal.
  - 22. A method as recited in claim 20, wherein the first domain specifies a first set of related activities that the user may request the device to perform, the device configured to perform activities of the set of related activities in response to receiving a command identified from speech of the user, and the second domain specifies a second set of related activities that the user may request the device to perform, the device configured to perform activities of the set of related activities in response to receiving a command identified from speech of the user.
  - 23. A method as recited in claim 20, wherein the first potential intent comprises an activity of the first set of related activities and the second potential intent comprises an activity of the second set of related activities.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Hart, Gregory Michael, Deramat, Frederic Johan Georges, Gundeti, Vikram Kumar, Thimsen, John Daniel, Lindsay, Allan Timothy, Carbon, Peter Paul Henri, Blanksteen, Scott Ian
Primary Examiner(s)
Godbold, Douglas

Application Number

US13/842,804
Time in Patent Office

1,257 Days
Field of Search

704270-270
US Class Current

1/1
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 15/00   Speech recognition G10L17/0...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/22   Procedures used during a sp...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/225   Feedback of the input speech

G10L 21/06   Transformation of speech in...

Speech recognition platforms

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

278 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Speech recognition platforms

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

278 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links