Systems and methods for enhancing responsiveness to utterances having detectable emotion

US 10,566,010 B2
Filed: 04/20/2018
Issued: 02/18/2020
Est. Priority Date: 04/20/2018
Status: Active Grant

First Claim

Patent Images

1. A method comprising:

receiving, via a media playback device, a first natural utterance from a user of the media playback device;

extracting a first emotion feature from the first natural utterance;

mapping the first emotion feature to a first emotion;

providing a first verbalized acknowledgement of the first natural utterance, the first verbalized acknowledgement identifying a first media item;

receiving, via the media playback device, a second natural utterance from the user of the media playback device in response to the first verbalized acknowledgement;

extracting a second emotion feature from the second natural utterance;

mapping the second emotion feature to a second emotion;

comparing the first emotion and the second emotion;

identifying, based on the comparing, a pivot from the first emotion to the second emotion, wherein the pivot is a direction of emotion change from the first emotion to the second emotion;

selecting the first media item or a second media item to provide a selected media item to be played based on the pivot;

playing the selected media item using the media playback device;

classifying the pivot with a classification selected from one of positive, negative, and neutral;

associating the pivot with a combination of the first natural utterance and the first media item;

receiving, via the media playback device and subsequent to the second natural utterance, a third natural utterance; and

in response to the third natural utterance and based at least in part on the classification (i) performing an action, (ii) providing a second verbalized acknowledgement of the third natural utterance, or a combination of (i) and (ii).

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods, systems, and related products that provide emotion-sensitive responses to user'"'"'s commands and other utterances received at an utterance-based user interface. Acknowledgements of user'"'"'s utterances are adapted to the user and/or the user device, and emotions detected in the user'"'"'s utterance that have been mapped from one or more emotion features extracted from the utterance. In some examples, extraction of a user'"'"'s changing emotion during a sequence of interactions is used to generate a response to a user'"'"'s uttered command. In some examples, emotion processing and command processing of natural utterances are performed asynchronously.

48 Citations

View as Search Results

24 Claims

1. A method comprising:
- receiving, via a media playback device, a first natural utterance from a user of the media playback device;
  
  extracting a first emotion feature from the first natural utterance;
  
  mapping the first emotion feature to a first emotion;
  
  providing a first verbalized acknowledgement of the first natural utterance, the first verbalized acknowledgement identifying a first media item;
  
  receiving, via the media playback device, a second natural utterance from the user of the media playback device in response to the first verbalized acknowledgement;
  
  extracting a second emotion feature from the second natural utterance;
  
  mapping the second emotion feature to a second emotion;
  
  comparing the first emotion and the second emotion;
  
  identifying, based on the comparing, a pivot from the first emotion to the second emotion, wherein the pivot is a direction of emotion change from the first emotion to the second emotion;
  
  selecting the first media item or a second media item to provide a selected media item to be played based on the pivot;
  
  playing the selected media item using the media playback device;
  
  classifying the pivot with a classification selected from one of positive, negative, and neutral;
  
  associating the pivot with a combination of the first natural utterance and the first media item;
  
  receiving, via the media playback device and subsequent to the second natural utterance, a third natural utterance; and
  
  in response to the third natural utterance and based at least in part on the classification (i) performing an action, (ii) providing a second verbalized acknowledgement of the third natural utterance, or a combination of (i) and (ii).
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method of claim 1, further comprising:
    - wherein when the classification is positive playing the first media item in response to the third natural utterance.
  - 3. The method of claim 1, further comprising:
    - wherein when the classification is negative identifying a media item that is different from the first media item in response to the third natural utterance.
  - 4. The method of claim 1, further comprising measuring a relatability between the third natural utterance and one or more of the first natural utterance, the second natural utterance, and the first verbalized acknowledgement.
  - 5. The method of claim 4, further comprising, determining that the third natural utterance meets or exceeds a predefined minimum threshold of relatability to one or more of the first natural utterance, the second natural utterance, and the first verbalized acknowledgement.
  - 6. The method of claim 1, wherein the mapping the first emotion feature and the mapping the second emotion feature are at least partially based on an identity of the user of the media playback device.
  - 7. The method of claim 1, wherein the providing the first verbalized acknowledgement is at least partially based on a type of the media playback device.
  - 8. The method of claim 1, further comprising determining that the second natural utterance is in response to the first verbalized acknowledgement.
  - 9. The method of claim 8, wherein the determining is partially based on a time interval between the first verbalized acknowledgement and the second natural utterance.
  - 10. The method claim 1, wherein each of the first emotion feature and the second emotion feature is one or more of a cadence, a volume, a pitch, a word, a string of words, a pace, and a tone of the first natural utterance and the second natural utterance, respectively.

11. A non-transitory computer readable medium, comprising:
- one or more processors configured to execute one or more sequences of emotion processor instructions, causing the one or more processors to generate an output adapted to a detected emotion in a natural utterance; and
  
  one or more sequences of instructions which, when executed by the one or more processors, cause the one or more processors to;
  
  receive, via a media playback device, a first natural utterance from a user of the media playback device;
  
  extract a first emotion feature from the first natural utterance;
  
  map the first emotion feature to a first emotion;
  
  provide a first verbalized acknowledgement of the first natural utterance, the first verbalized acknowledgement including identifying a first media item;
  
  receive, via the media playback device, a second natural utterance from the user of the media playback device in response to the first verbalized acknowledgement;
  
  extract a second emotion feature from the second natural utterance;
  
  map the second emotion feature to a second emotion;
  
  compare the first emotion and the second emotion;
  
  identify, based on the compare, a pivot from the first emotion to the second emotion, wherein the pivot is a direction of emotion change from the first emotion to the second emotion;
  
  select the first media item or a second media item to provide a selected media item to be played based on the pivot;
  
  play the selected media item using the media playback device;
  
  classify the pivot with a classification selected from one of positive, negative, and neutral;
  
  associate the pivot with a combination of the first natural utterance and the first media item;
  
  receive, via the media playback device and subsequent to the second natural utterance, a third natural utterance; and
  
  in response to the third natural utterance and based at least in part on the classification (i) perform an action, (ii) provide a second verbalized acknowledgement of the third natural utterance, or a combination of (i) and (ii).

12. A method comprising:
- receiving, via a media playback device, a first natural utterance from a user of the media playback device;
  
  extracting a first emotion feature from the first natural utterance;
  
  mapping the first emotion feature to a first emotion;
  
  providing a first verbalized acknowledgement of the first natural utterance, the first verbalized acknowledgement identifying a first media item;
  
  receiving, via the media playback device, a second natural utterance from the user of the media playback device in response to the first verbalized acknowledgement;
  
  extracting a second emotion feature from the second natural utterance;
  
  mapping the second emotion feature to a second emotion, wherein extracting the first emotion feature, mapping the first emotion feature to the first emotion, extracting the second emotion feature, and mapping the second emotion feature to the second emotion are performed using machine learning models;
  
  comparing the first emotion and the second emotion;
  
  identifying, based on the comparing, a pivot from the first emotion to the second emotion, wherein the pivot is a direction of emotion change from the first emotion to the second emotion;
  
  selecting the first media item or a second media item to provide a selected media item to be played based on the pivot; and
  
  playing the selected media item using the media playback device.
- View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 13. The method of claim 12, further comprising classifying the pivot with a classification selected from one of positive, negative, and neutral.
  - 14. The method of claim 13, further comprising:
    - associating the pivot with a combination of the first natural utterance and the first media item;
      
      receiving, via the media playback device and subsequent to the second natural utterance, a third natural utterance; and
      
      in response to the third natural utterance and based at least in part on the classification (i) performing an action, (ii) providing a second verbalized acknowledgement of the third natural utterance, or a combination of (i) and (ii).
  - 15. The method of claim 14, further comprising:
    - wherein when the classification is positive playing the first media item in response to the third natural utterance.
  - 16. The method of claim 14, further comprising:
    - wherein when the classification is negative identifying a media item that is different from the first media item in response to the third natural utterance.
  - 17. The method of claim 14, further comprising measuring a relatability between the third natural utterance and one or more of the first natural utterance, the second natural utterance, and the first verbalized acknowledgement.
  - 18. The method of claim 17, further comprising, determining that the third natural utterance meets or exceeds a predefined minimum threshold of relatability to one or more of the first natural utterance, the second natural utterance, and the first verbalized acknowledgement.
  - 19. The method of claim 12, wherein the mapping the first emotion feature and the mapping the second emotion feature are at least partially based on an identity of the user of the media playback device.
  - 20. The method of claim 12, wherein the providing the first verbalized acknowledgement is at least partially based on a type of the media playback device.
  - 21. The method of claim 12, further comprising determining that the second natural utterance is in response to the first verbalized acknowledgement.
  - 22. The method of claim 21, wherein the determining is partially based on a time interval between the first verbalized acknowledgement and the second natural utterance.
  - 23. The method of claim 12, wherein each of the first emotion feature and the second emotion feature is one or more of a cadence, a volume, a pitch, a word, a string of words, a pace, and a tone of the first natural utterance and the second natural utterance, respectively.

24. A non-transitory computer readable medium, comprising:
- one or more processors configured to execute one or more sequences of emotion processor instructions, causing the one or more processors to generate an output adapted to a detected emotion in a natural utterance; and
  
  one or more sequences of instructions which, when executed by the one or more processors, cause the one or more processors to;
  
  receive, via a media playback device, a first natural utterance from a user of the media playback device;
  
  extract a first emotion feature from the first natural utterance;
  
  map the first emotion feature to a first emotion;
  
  provide a first verbalized acknowledgement of the first natural utterance, the first verbalized acknowledgement including identifying a first media item;
  
  receive, via the media playback device, a second natural utterance from the user of the media playback device in response to the first verbalized acknowledgement;
  
  extract a second emotion feature from the second natural utterance;
  
  map the second emotion feature to a second emotion, wherein extract the first emotion feature, the map the first emotion feature to the first emotion, the extract the second emotion feature, and the map the second emotion feature to the second emotion are performed using machine learning models;
  
  compare the first emotion and the second emotion;
  
  identify, based on the compare, a pivot from the first emotion to the second emotion, wherein the pivot is a direction of emotion change from the first emotion to the second emotion;
  
  select the first media item or a second media item to provide a selected media item to be played based on the pivot; and
  
  play the selected media item using the media playback device.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Spotify AB (Spotify Technology SA)
Original Assignee
Spotify AB (Spotify Technology SA)
Inventors
Bromand, Daniel, Gustafsson, David, Mitic, Richard, Mennicken, Sarah
Primary Examiner(s)
Le, Thuykhanh

Application Number

US15/958,485
Publication Number

US 20190325896A1
Time in Patent Office

669 Days
Field of Search

None
US Class Current
CPC Class Codes

G10L 13/08   Text analysis or generation...

G10L 15/02   Feature extraction for spee...

G10L 15/22   Procedures used during a sp...

G10L 2015/223   Execution procedure of a sp...

G10L 2015/227   of the speaker; Human-fact...

G10L 25/63   for estimating an emotional...

Systems and methods for enhancing responsiveness to utterances having detectable emotion

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

48 Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for enhancing responsiveness to utterances having detectable emotion

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

48 Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links