Speech recognition services

US 10,580,408 B1
Filed: 02/14/2018
Issued: 03/03/2020
Est. Priority Date: 08/31/2012
Status: Active Grant

First Claim

Patent Images

1. A system comprising:

one or more processors;

computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;

receiving an audio signal that represents speech of a user;

performing speech recognition on the audio signal to generate speech-recognition results;

comparing one or more words of the speech-recognition results with one or words in a first set of related activities;

comparing the one or more words of the speech-recognition results with one or more words in a second set of related activities;

determining context information associated with the speech-recognition results;

identifying a first number of first activities associated with the speech of the user from the first set of related activities based at least in part on the comparing and the context information;

identifying a second number of second activities associated with the speech of the user from the second set of related activities based at least in part on the comparing and the context information;

selecting the first set of related activities based at least in part on the first number being greater than the second number;

selecting a particular first activity from the first set of related activities; and

causing performance of one or more actions corresponding to the speech of the user based at least on the particular first activity.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A speech recognition platform configured to receive an audio signal that includes speech from a user and perform automatic speech recognition (ASR) on the audio signal to identify ASR results. The platform may identify: (i) a domain of a voice command within the speech based on the ASR results and based on context information associated with the speech or the user, and (ii) an intent of the voice command. In response to identifying the intent, the platform may perform a corresponding action, such as streaming audio to the device, setting a reminder for the user, purchasing an item on behalf of the user, making a reservation for the user or launching an application for the user. The speech recognition platform, in combination with the device, may therefore facilitate efficient interactions between the user and a voice-controlled device.

59 Citations

View as Search Results

20 Claims

1. A system comprising:
- one or more processors;
  
  computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;
  
  receiving an audio signal that represents speech of a user;
  
  performing speech recognition on the audio signal to generate speech-recognition results;
  
  comparing one or more words of the speech-recognition results with one or words in a first set of related activities;
  
  comparing the one or more words of the speech-recognition results with one or more words in a second set of related activities;
  
  determining context information associated with the speech-recognition results;
  
  identifying a first number of first activities associated with the speech of the user from the first set of related activities based at least in part on the comparing and the context information;
  
  identifying a second number of second activities associated with the speech of the user from the second set of related activities based at least in part on the comparing and the context information;
  
  selecting the first set of related activities based at least in part on the first number being greater than the second number;
  
  selecting a particular first activity from the first set of related activities; and
  
  causing performance of one or more actions corresponding to the speech of the user based at least on the particular first activity.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The system as recited in claim 1, wherein the context information associated with the speech-recognition results comprises an identity of the user;
    - and the acts further comprise;
      
      analyzing one or more acoustic characteristics of the audio signal;
      
      determining a voice print associated with the audio signal based at least in part on the acoustic characteristics;
      
      comparing the voice print to reference voice prints;
      
      determining the identity of the user based at least in part on the comparing; and
      
      wherein;
      
      identifying the first number of first activities associated with the speech of the user from the first set of related activities is further based at least in part on the identity of the user;
      
      identifying the second number of second activities associated with the speech of the user from the second set of related activities is further based at least in part on the identity of the user.
  - 3. The system as recited in claim 1, wherein the context information associated with the speech-recognition results comprises a location of the user;
    - and wherein;
      
      identifying the first number of first activities associated with the speech of the user from the first set of related activities is further based at least in part on the location of the user;
      
      identifying the second number of second activities associated with the speech of the user from the second set of related activities is further based at least in part on the location of the user.
  - 4. The system as recited in claim 1, wherein determining context information associated with the speech-recognition results comprises accessing data associated with a stored context representing at least previous speech-recognition results from a previous interaction between the user and a device;
    - comparing one or more words of the previous speech-recognition results with one or words in the first set of related activities; and
      
      comparing the one or more words of the previous speech-recognition results with one or words in the second set of related activities; and
      
      wherein;
      
      identifying the first number of first activities associated with the speech of the user from the first set of related activities is further based at least in part on the previous speech-recognition results;
      
      identifying the second number of second activities associated with the speech of the user from the second set of related activities is further based at least in part on the previous speech-recognition results.
  - 5. The system as recited in claim 1, wherein determining context information associated with the speech-recognition results comprises determining an application in use on a device associated with the user;
    - and wherein;
      
      identifying the first number of first activities associated with the speech of the user from the first set of related activities is further based at least in part on the application;
      
      identifying the second number of second activities associated with the speech of the user from the second set of related activities is further based at least in part on the application.
  - 6. The system as recited in claim 1, wherein determining context information associated with the speech-recognition results comprises determining a time of day that the audio signal is received;
    - and wherein;
      
      identifying the first number of first activities associated with the speech of the user from the first set of related activities is further based at least in part on the time of day;
      
      identifying the second number of second activities associated with the speech of the user from the second set of related activities is further based at least in part on the time of day.
  - 7. The system as recited in claim 1, the acts further comprising:
    - determining a device that generated the audio signal; and
      
      wherein;
      
      identifying the first number of first activities associated with the speech of the user from the first set of related activities is further based at least in part on the device;
      
      identifying the second number of second activities associated with the speech of the user from the second set of related activities is further based at least in part on the device.
  - 8. The system as recited in claim 1, the acts further comprising:
    - determining that at least one word for causing performance of the one or more actions corresponding to the speech of the user is missing from the speech-recognition results; and
      
      identifying the at least one word based at least in part on the context information.

9. A system comprising:
- one or more processors; and
  
  computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;
  
  receiving an audio signal that represents speech of a user;
  
  performing speech recognition on the audio signal to generate speech-recognition results;
  
  comparing the speech-recognition results to multiple sets of related activities;
  
  identifying first potential activities represented in the speech-recognition results from a first set of related activities;
  
  identifying second potential activities represented in the speech-recognition results from a second set of related activities;
  
  ranking the first potential activities and the second potential activities;
  
  selecting the first set of related activities based at least in part on the ranking;
  
  determining a highest ranked first potential activity from the first potential activities; and
  
  providing an output audio signal for audible output based at least in part on the highest ranked first potential activity.
- View Dependent Claims (10, 11, 12, 13, 14, 15)
- - 10. The system as recited in claim 9, wherein each of the first set of related activities and the second set of related activities specify different types of activities that the user may request the system to perform, the system configured to perform activities of the different types of activities based at least in part on receiving a command identified from speech of the user.
  - 11. The system as recited in claim 9, wherein comparing the speech-recognition results to multiple sets of related activities comprises:
    - identifying one or more words of the speech-recognition results; and
      
      comparing the one or more words of the speech-recognition results with one or more words of the first set of related activities and the second set of related activities.
  - 12. The system as recited in claim 9, wherein comparing the speech-recognition results to multiple sets of related activities comprises:
    - identifying that at least one word of the speech-recognition results is specifically associated with the first potential activities; and
      
      wherein selecting the first set of related activities is further based at least in part on the at least one word.
  - 13. The system as recited in claim 9, wherein comparing the speech-recognition results to multiple sets of related activities comprises:
    - determining context information associated with the speech-recognition results for each of the first set of related activities and the second set of related activities; and
      
      wherein identifying first potential activities and second potential activities is at least based at least in part on the context information.
  - 14. The system as recited in claim 13, wherein the context information is based at least in part on previous speech of the user.
  - 15. The system as recited in claim 13, wherein the context information is based at least in part on a location of the user, preferences of the user, information from an application identified by the speech of the user, a device that receives the audio signal, or a time of day that the audio signal is received.

16. A method comprising:
- receiving an audio signal generated by a device, the audio signal representing at least speech of a user;
  
  performing speech recognition on the speech to generate speech-recognition results, the speech-recognition results including one or more words from the speech of the user;
  
  identifying a first set of related activities based at least in part on the one or more words of the speech-recognition results;
  
  identifying a second set of related activities based at least in part on the one or more words of the speech-recognition results;
  
  determining first potential activities represented in the speech-recognition results from the first set of related activities;
  
  determining second potential activities represented in the speech-recognition results from the second set of related activities;
  
  ranking the first potential activities and the second potential activities;
  
  selecting the first set of related activities or the second set of related activities based at least in part on the ranking;
  
  determining a highest ranked one of the first potential activities or of the second potential activities based at least in part on the selecting the first set of related activities or the second set of related activities; and
  
  providing an output audio signal for audible output on the device based at least in part on the highest ranked first potential activities or second potential activities.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The method as recited in claim 16, further comprising:
    - streaming audio to the device, setting a reminder for the user, ordering or purchasing an item on behalf of the user, making a reservation for the user, or launching an application for a user.
  - 18. The method as recited in claim 16, wherein the first set of related activities or second set of related activities are selected based at least in part on a speech of the user that occurs at least partly subsequent to receiving the audio signal.
  - 19. The method as recited in claim 16, further comprising obtaining context information associated with the speech or with the user at least partly in response to receiving the audio signal, and wherein the first set of related activities or second set of related activities are identified based at least in part on the context information.
  - 20. The method as recited in claim 16, wherein determining first potential activities comprises determining a first matching number of words of the one or more words of the speech-recognition results and one or more of the first set of related activities;
    - andwherein determining second potential activities comprises determining a second matching number of words of the one or more words of the speech-recognition results and one or more of the second set of related activities.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Original Assignee
Amazon Technologies, Inc. (Amazon.com, Inc.)
Inventors
Hart, Gregory Michael, Carbon, Peter Paul Henri, Thimsen, John Daniel, Gundeti, Vikram Kumar, Blanksteen, Scott Ian, Lindsay, Allan Timothy, Deramat, Frederic Johan Georges
Primary Examiner(s)
Godbold, Douglas

Application Number

US15/896,495
Time in Patent Office

748 Days
Field of Search
US Class Current
CPC Class Codes

G06F 3/167   Audio in a user interface, ...

G10L 15/00   Speech recognition G10L17/0...

G10L 15/1815   Semantic context, e.g. disa...

G10L 15/22   Procedures used during a sp...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/225   Feedback of the input speech

G10L 21/06   Transformation of speech in...

Speech recognition services

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

59 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Speech recognition services

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

59 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others