Speech recognition platforms
First Claim
1. A system comprising:
- a coordination component configured to receive an audio signal generated based at least in part on sound from an environment, the sound captured by a device that includes a microphone unit and a speaker and the audio signal representing speech of a user in the environment;
a speech recognition component configured to perform speech recognition on the audio signal to generate a speech-recognition result;
a natural language understanding (NLU) component configured to;
provide the speech-recognition result to a first domain and a second domain;
identify one or more named entities within the speech-recognition result for each of the first domain and the second domain;
fill one or more slots within the speech-recognition result for each of the first domain and the second domain based at least in part on context information associated with the speech of the user;
identify, for the first domain, multiple first intents associated with the speech of the user based at least in part on the one or more named entities and the one or more slots filled for the first domain;
identifying, for the second domain, multiple second intents associated with the speech of the user based at least in part on the one or more named entities and the one or more slots filled for the second domain; and
rank the multiple first intents and the multiple second intents relative to one another;
a dialog component configured to;
select one of the first domain or the second domain based at least in part on the ranking; and
select one of the multiple first intents or one of the multiple second intents that is associated with the selected domain; and
a response component configured to receive an indication of the selected domain and an indication of the selected intent and cause performance of an action corresponding to the selected intent, the action including providing an audio signal for output on the speaker of the device.
2 Assignments
0 Petitions
Accused Products
Abstract
A speech recognition platform configured to receive an audio signal that includes speech from a user and perform automatic speech recognition (ASR) on the audio signal to identify ASR results. The platform may identify: (i) a domain of a voice command within the speech based on the ASR results and based on context information associated with the speech or the user, and (ii) an intent of the voice command. In response to identifying the intent, the platform may perform a corresponding action, such as streaming audio to the device, setting a reminder for the user, purchasing an item on behalf of the user, making a reservation for the user or launching an application for the user. The speech recognition platform, in combination with the device, may therefore facilitate efficient interactions between the user and a voice-controlled device.
278 Citations
23 Claims
-
1. A system comprising:
-
a coordination component configured to receive an audio signal generated based at least in part on sound from an environment, the sound captured by a device that includes a microphone unit and a speaker and the audio signal representing speech of a user in the environment; a speech recognition component configured to perform speech recognition on the audio signal to generate a speech-recognition result; a natural language understanding (NLU) component configured to; provide the speech-recognition result to a first domain and a second domain; identify one or more named entities within the speech-recognition result for each of the first domain and the second domain; fill one or more slots within the speech-recognition result for each of the first domain and the second domain based at least in part on context information associated with the speech of the user; identify, for the first domain, multiple first intents associated with the speech of the user based at least in part on the one or more named entities and the one or more slots filled for the first domain; identifying, for the second domain, multiple second intents associated with the speech of the user based at least in part on the one or more named entities and the one or more slots filled for the second domain; and rank the multiple first intents and the multiple second intents relative to one another; a dialog component configured to; select one of the first domain or the second domain based at least in part on the ranking; and select one of the multiple first intents or one of the multiple second intents that is associated with the selected domain; and a response component configured to receive an indication of the selected domain and an indication of the selected intent and cause performance of an action corresponding to the selected intent, the action including providing an audio signal for output on the speaker of the device. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system comprising:
-
one or more processors; and computer-readable media storing computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform acts comprising; receiving an audio signal that represents speech from a user, the speech of the user received at a device that includes a speaker; retrieving context information associated with the speech, the context information based at least in part on a previous interaction between the user and the device; performing speech recognition on the audio signal to generate speech-recognition results; provide the speech-recognition results to a first domain and a second domain; identify one or more named entities within the speech-recognition results for each of the first domain and the second domain; fill one or more slots within the speech-recognition results for each of the first domain and the second domain based at least in part on the context information; identifying multiple first intents, for the first domain, associated with the speech based at least in part on the one or more named entities and one or more slots filled for the first domain; identifying, for the second domain, multiple second intents associated with the speech of the user based at least in part on the one or more named entities and the one or more slots filled for the second domain; selecting one of the first domain or the second domain based at least in part on the multiple first intents and the multiple second intents; selecting, from the multiple first intents or multiple second intents, an intent that is associated with the selected domain; and providing an audio signal for output on the speaker of the device based at least in part on the selected intent and the selected domain. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13)
-
-
14. A method comprising:
-
under control of one or more computing systems configured with executable instructions, receiving an audio signal generated by a device, the audio signal representing speech of a user; performing speech recognition on the speech to generate speech-recognition results; provide the speech-recognition results to a first domain and a second domain; identify one or more named entities within the speech-recognition results for each of the first domain and the second domain; fill one or more slots within the speech-recognition results for each of the first domain and the second domain based at least in part on context information associated with the speech of the user; identifying, based at least in part on the one or more named entities and one or more slots filled for the first domain, a first potential intent of the speech, the first potential intent being associated with the first domain; identifying, based at least in part on the one or more named entities and one or more slots filled for the second domain, a second potential intent of the speech, the second potential intent being associated with a second domain; selecting the first potential intent or the second potential intent; and providing an audio signal for output on the device based at least in part on the selecting. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23)
-
Specification