Speech recognition services
First Claim
1. A system comprising:
- one or more processors;
computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising;
receiving an audio signal that represents speech of a user;
performing speech recognition on the audio signal to generate speech-recognition results;
comparing one or more words of the speech-recognition results with one or words in a first set of related activities;
comparing the one or more words of the speech-recognition results with one or more words in a second set of related activities;
determining context information associated with the speech-recognition results;
identifying a first number of first activities associated with the speech of the user from the first set of related activities based at least in part on the comparing and the context information;
identifying a second number of second activities associated with the speech of the user from the second set of related activities based at least in part on the comparing and the context information;
selecting the first set of related activities based at least in part on the first number being greater than the second number;
selecting a particular first activity from the first set of related activities; and
causing performance of one or more actions corresponding to the speech of the user based at least on the particular first activity.
2 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition platform configured to receive an audio signal that includes speech from a user and perform automatic speech recognition (ASR) on the audio signal to identify ASR results. The platform may identify: (i) a domain of a voice command within the speech based on the ASR results and based on context information associated with the speech or the user, and (ii) an intent of the voice command. In response to identifying the intent, the platform may perform a corresponding action, such as streaming audio to the device, setting a reminder for the user, purchasing an item on behalf of the user, making a reservation for the user or launching an application for the user. The speech recognition platform, in combination with the device, may therefore facilitate efficient interactions between the user and a voice-controlled device.
59 Citations
20 Claims
-
1. A system comprising:
-
one or more processors; computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising; receiving an audio signal that represents speech of a user; performing speech recognition on the audio signal to generate speech-recognition results; comparing one or more words of the speech-recognition results with one or words in a first set of related activities; comparing the one or more words of the speech-recognition results with one or more words in a second set of related activities; determining context information associated with the speech-recognition results; identifying a first number of first activities associated with the speech of the user from the first set of related activities based at least in part on the comparing and the context information; identifying a second number of second activities associated with the speech of the user from the second set of related activities based at least in part on the comparing and the context information; selecting the first set of related activities based at least in part on the first number being greater than the second number; selecting a particular first activity from the first set of related activities; and causing performance of one or more actions corresponding to the speech of the user based at least on the particular first activity. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
-
one or more processors; and computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising; receiving an audio signal that represents speech of a user; performing speech recognition on the audio signal to generate speech-recognition results; comparing the speech-recognition results to multiple sets of related activities; identifying first potential activities represented in the speech-recognition results from a first set of related activities; identifying second potential activities represented in the speech-recognition results from a second set of related activities; ranking the first potential activities and the second potential activities; selecting the first set of related activities based at least in part on the ranking; determining a highest ranked first potential activity from the first potential activities; and providing an output audio signal for audible output based at least in part on the highest ranked first potential activity. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A method comprising:
-
receiving an audio signal generated by a device, the audio signal representing at least speech of a user; performing speech recognition on the speech to generate speech-recognition results, the speech-recognition results including one or more words from the speech of the user; identifying a first set of related activities based at least in part on the one or more words of the speech-recognition results; identifying a second set of related activities based at least in part on the one or more words of the speech-recognition results; determining first potential activities represented in the speech-recognition results from the first set of related activities; determining second potential activities represented in the speech-recognition results from the second set of related activities; ranking the first potential activities and the second potential activities; selecting the first set of related activities or the second set of related activities based at least in part on the ranking; determining a highest ranked one of the first potential activities or of the second potential activities based at least in part on the selecting the first set of related activities or the second set of related activities; and providing an output audio signal for audible output on the device based at least in part on the highest ranked first potential activities or second potential activities. - View Dependent Claims (17, 18, 19, 20)
-
Specification