Systems and methods for enhancing responsiveness to utterances having detectable emotion
First Claim
Patent Images
1. A method comprising:
- receiving, via a media playback device, a first natural utterance from a user of the media playback device;
extracting a first emotion feature from the first natural utterance;
mapping the first emotion feature to a first emotion;
providing a first verbalized acknowledgement of the first natural utterance, the first verbalized acknowledgement identifying a first media item;
receiving, via the media playback device, a second natural utterance from the user of the media playback device in response to the first verbalized acknowledgement;
extracting a second emotion feature from the second natural utterance;
mapping the second emotion feature to a second emotion;
comparing the first emotion and the second emotion;
identifying, based on the comparing, a pivot from the first emotion to the second emotion, wherein the pivot is a direction of emotion change from the first emotion to the second emotion;
selecting the first media item or a second media item to provide a selected media item to be played based on the pivot;
playing the selected media item using the media playback device;
classifying the pivot with a classification selected from one of positive, negative, and neutral;
associating the pivot with a combination of the first natural utterance and the first media item;
receiving, via the media playback device and subsequent to the second natural utterance, a third natural utterance; and
in response to the third natural utterance and based at least in part on the classification (i) performing an action, (ii) providing a second verbalized acknowledgement of the third natural utterance, or a combination of (i) and (ii).
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, systems, and related products that provide emotion-sensitive responses to user'"'"'s commands and other utterances received at an utterance-based user interface. Acknowledgements of user'"'"'s utterances are adapted to the user and/or the user device, and emotions detected in the user'"'"'s utterance that have been mapped from one or more emotion features extracted from the utterance. In some examples, extraction of a user'"'"'s changing emotion during a sequence of interactions is used to generate a response to a user'"'"'s uttered command. In some examples, emotion processing and command processing of natural utterances are performed asynchronously.
48 Citations
24 Claims
-
1. A method comprising:
-
receiving, via a media playback device, a first natural utterance from a user of the media playback device; extracting a first emotion feature from the first natural utterance; mapping the first emotion feature to a first emotion; providing a first verbalized acknowledgement of the first natural utterance, the first verbalized acknowledgement identifying a first media item; receiving, via the media playback device, a second natural utterance from the user of the media playback device in response to the first verbalized acknowledgement; extracting a second emotion feature from the second natural utterance; mapping the second emotion feature to a second emotion; comparing the first emotion and the second emotion; identifying, based on the comparing, a pivot from the first emotion to the second emotion, wherein the pivot is a direction of emotion change from the first emotion to the second emotion; selecting the first media item or a second media item to provide a selected media item to be played based on the pivot; playing the selected media item using the media playback device; classifying the pivot with a classification selected from one of positive, negative, and neutral; associating the pivot with a combination of the first natural utterance and the first media item; receiving, via the media playback device and subsequent to the second natural utterance, a third natural utterance; and in response to the third natural utterance and based at least in part on the classification (i) performing an action, (ii) providing a second verbalized acknowledgement of the third natural utterance, or a combination of (i) and (ii). - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A non-transitory computer readable medium, comprising:
-
one or more processors configured to execute one or more sequences of emotion processor instructions, causing the one or more processors to generate an output adapted to a detected emotion in a natural utterance; and one or more sequences of instructions which, when executed by the one or more processors, cause the one or more processors to; receive, via a media playback device, a first natural utterance from a user of the media playback device; extract a first emotion feature from the first natural utterance; map the first emotion feature to a first emotion; provide a first verbalized acknowledgement of the first natural utterance, the first verbalized acknowledgement including identifying a first media item; receive, via the media playback device, a second natural utterance from the user of the media playback device in response to the first verbalized acknowledgement; extract a second emotion feature from the second natural utterance; map the second emotion feature to a second emotion; compare the first emotion and the second emotion; identify, based on the compare, a pivot from the first emotion to the second emotion, wherein the pivot is a direction of emotion change from the first emotion to the second emotion; select the first media item or a second media item to provide a selected media item to be played based on the pivot; play the selected media item using the media playback device; classify the pivot with a classification selected from one of positive, negative, and neutral; associate the pivot with a combination of the first natural utterance and the first media item; receive, via the media playback device and subsequent to the second natural utterance, a third natural utterance; and in response to the third natural utterance and based at least in part on the classification (i) perform an action, (ii) provide a second verbalized acknowledgement of the third natural utterance, or a combination of (i) and (ii).
-
-
12. A method comprising:
-
receiving, via a media playback device, a first natural utterance from a user of the media playback device; extracting a first emotion feature from the first natural utterance; mapping the first emotion feature to a first emotion; providing a first verbalized acknowledgement of the first natural utterance, the first verbalized acknowledgement identifying a first media item; receiving, via the media playback device, a second natural utterance from the user of the media playback device in response to the first verbalized acknowledgement; extracting a second emotion feature from the second natural utterance; mapping the second emotion feature to a second emotion, wherein extracting the first emotion feature, mapping the first emotion feature to the first emotion, extracting the second emotion feature, and mapping the second emotion feature to the second emotion are performed using machine learning models; comparing the first emotion and the second emotion; identifying, based on the comparing, a pivot from the first emotion to the second emotion, wherein the pivot is a direction of emotion change from the first emotion to the second emotion; selecting the first media item or a second media item to provide a selected media item to be played based on the pivot; and playing the selected media item using the media playback device. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A non-transitory computer readable medium, comprising:
-
one or more processors configured to execute one or more sequences of emotion processor instructions, causing the one or more processors to generate an output adapted to a detected emotion in a natural utterance; and one or more sequences of instructions which, when executed by the one or more processors, cause the one or more processors to; receive, via a media playback device, a first natural utterance from a user of the media playback device; extract a first emotion feature from the first natural utterance; map the first emotion feature to a first emotion; provide a first verbalized acknowledgement of the first natural utterance, the first verbalized acknowledgement including identifying a first media item; receive, via the media playback device, a second natural utterance from the user of the media playback device in response to the first verbalized acknowledgement; extract a second emotion feature from the second natural utterance; map the second emotion feature to a second emotion, wherein extract the first emotion feature, the map the first emotion feature to the first emotion, the extract the second emotion feature, and the map the second emotion feature to the second emotion are performed using machine learning models; compare the first emotion and the second emotion; identify, based on the compare, a pivot from the first emotion to the second emotion, wherein the pivot is a direction of emotion change from the first emotion to the second emotion; select the first media item or a second media item to provide a selected media item to be played based on the pivot; and play the selected media item using the media playback device.
-
Specification