Intent re-ranker
First Claim
1. A computer-implemented method, comprising:
- receiving, from a first electronic device and at a speech-processing system, first audio data representing a first utterance, the first utterance including a request to perform a first action;
receiving, at the speech-processing system, first notification data indicating that content is being presented on a display screen associated with the first electronic device;
generating first text data representing the first audio data;
determining, using the first text data, that the first utterance corresponds to;
a first intent hypothesis associated with first functionality, the first intent hypothesis associated with first slot data, anda second intent hypothesis associated with second functionality, the second intent hypothesis associated with second slot data;
determining that a first domain provided the content, the first domain being associated with the first functionality;
requesting, from the first domain, entity data representing one or more entities associated with the content;
receiving, from at least one system component associated with the first domain, the entity data, the at least one system component being different than the first electronic device and the entity data corresponding to the first slot data;
selecting the first intent hypothesis as being representative of the first utterance instead of selecting the second intent hypothesis based, at least in part, on the entity data corresponding to the first slot data instead of corresponding to the second slot data; and
based at least in part on selecting the first intent hypothesis, causing the first domain to perform the first action in accordance with the first functionality.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and systems for determining an intent of an utterance using contextual information associated with a requesting device are described herein. Voice activated electronic devices may, in some embodiments, be capable of displaying content using a display screen. Entity data representing the content rendered by the display screen may describe entities having similar attributes as an identified intent from natural language understanding processing. Natural language understanding processing may attempt to resolve one or more declared slots for a particular intent and may generate an initial list of intent hypotheses ranked to indicate which are most likely to correspond to the utterance. The entity data may be compared with the declared slots for the intent hypotheses, and the list of intent hypothesis may be re-ranked to account for matching slots from the contextual metadata. The top ranked intent hypothesis after re-ranking may then be selected as the utterance'"'"'s intent.
115 Citations
23 Claims
-
1. A computer-implemented method, comprising:
-
receiving, from a first electronic device and at a speech-processing system, first audio data representing a first utterance, the first utterance including a request to perform a first action; receiving, at the speech-processing system, first notification data indicating that content is being presented on a display screen associated with the first electronic device; generating first text data representing the first audio data; determining, using the first text data, that the first utterance corresponds to; a first intent hypothesis associated with first functionality, the first intent hypothesis associated with first slot data, and a second intent hypothesis associated with second functionality, the second intent hypothesis associated with second slot data; determining that a first domain provided the content, the first domain being associated with the first functionality; requesting, from the first domain, entity data representing one or more entities associated with the content; receiving, from at least one system component associated with the first domain, the entity data, the at least one system component being different than the first electronic device and the entity data corresponding to the first slot data; selecting the first intent hypothesis as being representative of the first utterance instead of selecting the second intent hypothesis based, at least in part, on the entity data corresponding to the first slot data instead of corresponding to the second slot data; and based at least in part on selecting the first intent hypothesis, causing the first domain to perform the first action in accordance with the first functionality. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-implemented method, comprising:
-
receiving, from a first device, first audio data representing a first utterance; receiving first indication data indicating that first media content was being output by a media output component when at least a portion of the first utterance was spoken, the media output component being associated with the first device; determining, based at least in part on the first indication data, a first domain associated with the first media content; generating first text data representing the first audio data; determining, using the first text data, a first plurality of intent hypotheses associated with the first utterance, the first plurality of intent hypotheses including at least a first intent hypothesis and a second intent hypothesis; receiving, from at least one system component associated with the first domain, first entity data representing one or more entities associated with the first media content, the at least one system component being different than the first device; determining that the first entity data corresponds to at least first slot data; determining that the first intent hypothesis is associated with at least the first slot data; selecting, from the first plurality of intent hypotheses, the first intent hypothesis as being associated with the first utterance based, at least in part, on the first intent hypothesis being associated with the first slot data and the first entity data corresponding to the first slot data; generating first output data representing the first intent hypothesis and the first entity data; and based at least in part on selecting the first intent hypothesis, causing the first domain to perform at least one first function associated with the first intent hypothesis, wherein the causing comprises sending the output data to the first domain. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14)
-
-
15. A computing system, comprising:
-
at least one processor; and at least one computer-readable medium encoded with instructions which, when executed by the at least one processor, cause the computing system to; receive, from a first device, first audio data representing a first utterance, receive first indication data indicating that first media content was being output by media output component when at least a portion of the first utterance was spoken, the media output component being associated with the first device, determine, based at least in part on the first indication data, a first domain associated with the first media content, generate first text data representing the first audio data, determine, using the first text data, a first plurality of intent hypotheses associated with the first utterance, the first plurality of intent hypotheses including at least a first intent hypothesis and a second intent hypothesis, receive, from at least one system component associated with the first domain, first entity data representing one or more entities associated with the first media content, the at least one system component being different than the first device, generate a first score associated with the first intent hypothesis based, at least in part, on the first text data and the first entity data, generate a second score associated with the second intent hypothesis based, at least in part, on the first text data and the first entity data, select, from among the first plurality of intent hypotheses, the first intent hypothesis as being associated with the first utterance based, at least in part, on the first score being greater than the second score, and based at least in part on selecting the first intent hypothesis, cause the first domain to perform at least one first function associated with the first intent hypothesis. - View Dependent Claims (16, 17, 18, 19)
-
-
20. A computing system comprising:
-
at least one processor; and at least one computer-readable medium encoded with instructions which, when executed by the at least one processor, cause the computing system to; receive, from a first device, first audio data representing a first utterance, receive first indication data indicating that first media content was being output by media output component when at least a portion of the first utterance was spoken, the media output component being associated with the first device, determine, based at least in part on the first indication data, a first domain associated with the first media content, generate first text data representing the first audio data, determine, using the first text data, a first plurality of intent hypotheses associated with the first utterance, the first plurality of intent hypotheses including at least a first intent hypothesis and a second intent hypothesis, receive, from at least one system component associated with the first domain, first entity data representing one or more entities associated with the first media content, the at least one system component being different than the first device, generate a first hypotheses list comprising the first plurality of intent hypotheses ranked in a first order, generate, based at least in part on the first entity data, a second hypotheses list comprising at least a portion of the first plurality of intent hypotheses ranked in a second order, select, from the second hypotheses list, the first intent hypothesis as being associated with the first utterance based at least in part on the second order, and based at least in part on selecting the first intent hypothesis, cause the first domain to perform at least one first function associated with the first intent hypothesis. - View Dependent Claims (21, 22)
-
-
23. A computer-implemented method comprising:
-
receiving, from a first device, first audio data representing a first utterance; receiving first indication data indicating that first media content was being output by a media output component when at least a portion of the first utterance was spoken, the media output component being associated with the first device; determining, based at least in part on the first indication data, a first domain associated with the first media content; generating first text data representing the first audio data; determining, using the first text data, a first plurality of intent hypotheses associated with the first utterance; receiving, from at least one system component associated with the first domain, first entity data representing one or more entities associated with the first media content, the at least one system component being different than the first device; selecting, from the first plurality of intent hypotheses, a first intent hypothesis as being associated with the first utterance based, at least in part, on the first entity data; based at least in part on selecting the first intent hypothesis, causing the first domain to perform at least one first function associated with the first intent hypothesis; receiving, from the first device, second audio data representing a second utterance; generating second text data representing the second audio data; determining, using the second text data, a second plurality of intent hypotheses associated with the second utterance; receiving second indication data indicating a device type associated with the first device; selecting, from among the second plurality of intent hypotheses, a second intent hypothesis as being associated with the second utterance based, at least in part, on the device type; and based at least in part on selecting the second intent hypothesis, causing at least one second function associated with the second intent hypothesis to be performed.
-
Specification