METHOD AND APPARATUS FOR REAL TIME EMOTION DETECTION IN AUDIO INTERACTIONS

US 20140257820A1
Filed: 03/10/2013
Published: 09/11/2014
Est. Priority Date: 03/10/2013
Status: Active Grant

First Claim

Patent Images

1. A computerized method for real time emotion detection in audio interactions comprising:

receiving at a computer server a portion of an audio interaction between a customer and an organization representative, the portion of the audio interaction comprises a speech signal;

extracting feature vectors from the speech signal;

obtaining a statistical model;

producing adapted statistical data by adapting the statistical model according to the speech signal using the feature vectors extracted from the speech signal;

obtaining an emotion classification model; and

producing an emotion score based on the adapted statistical data and the emotion classification model, said emotion score represents the probability that the speaker that produced the speech signal is in an emotional state.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The subject matter discloses a computerized method for real time emotion detection in audio interactions comprising: receiving at a computer server a portion of an audio interaction between a customer and an organization representative, the portion of the audio interaction comprises a speech signal; extracting feature vectors from the speech signal; obtaining a statistical model; producing adapted statistical data by adapting the statistical model according to the speech signal using the feature vectors extracted from the speech signal; obtaining an emotion classification model; and producing an emotion score based on the adapted statistical data and the emotion classification model, said emotion score represents the probability that the speaker that produced the speech signal is in an emotional state.

55 Citations

View as Search Results

14 Claims

1. A computerized method for real time emotion detection in audio interactions comprising:
- receiving at a computer server a portion of an audio interaction between a customer and an organization representative, the portion of the audio interaction comprises a speech signal;
  
  extracting feature vectors from the speech signal;
  
  obtaining a statistical model;
  
  producing adapted statistical data by adapting the statistical model according to the speech signal using the feature vectors extracted from the speech signal;
  
  obtaining an emotion classification model; and
  
  producing an emotion score based on the adapted statistical data and the emotion classification model, said emotion score represents the probability that the speaker that produced the speech signal is in an emotional state.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14)
- - 2. The method according to claim 1, further comprises storing the emotion score in an emotion flow vector, said emotion flow vector stores a plurality of emotion scores over time;
    - generating an emotion detection signal based on the plurality of emotion scores stored in the emotion flow vector; and
      
      issuing an emotion alert to a contact center employee based on the emotion detection signal while the audio interaction is in progress.
  - 3. The method according to claim 2 wherein the generation of the emotion detection signal is based on detection of predefined patterns in the plurality of emotion scores stored in the emotion flow vector.
  - 4. The method according to claim 2 wherein the generation of emotion detection signal is based on a mathematical function that is applied on the stored emotion scores.
  - 5. The method according to claim 1, wherein the adapted statistical data is produced by extracting a means vector from the adapted statistical model;
  - 6. The method according to claim 1, further comprises displaying the plurality of emotion scores stored in the emotion flow vector while the audio interaction is in progress.
  - 7. The method according to claim 1 wherein extracting feature vectors from the speech signal comprises extracting Mel-Frequency Cepstral Coefficients and their derivatives from the speech signal.
  - 9. The method according to claim 1 wherein adapting the statistical model using the feature vectors extracted from the speech signal is based on maximum a posteriori probability adaptation;
  - 10. The method according to claim 1 wherein said emotion classification model generation comprises:
    - obtaining a plurality of audio interactions;
      
      associating each portion of each one of the plurality of audio interactions with a first class or with a second class;
      
      extracting a plurality of feature vectors from the plurality of audio interactions;
      
      obtaining the statistical model;
      
      generating a plurality of first adapted statistical models by adapting the statistical model using the plurality of feature vectors that are extracted from the portions that are associated with the first class;
      
      generating a plurality of second adapted statistical models by adapting the statistical model using the plurality of feature vectors extracted from the portions that are associated with the second class;
      
      producing a plurality of first adapted statistical data from the plurality of the first adapted statistical models;
      
      producing a plurality of second adapted statistical data from the plurality of the second adapted statistical models; and
      
      generating the emotion classification model based on the plurality of first adapted statistical data and the plurality of second adapted statistical data.
  - 11. The method according to claim 10 wherein extracting a plurality of feature vectors from the plurality of audio interactions comprises extracting Mel-Frequency Cepstral Coefficients and their derivatives from the plurality of audio interactions.
  - 12. The method according to claim 10 wherein generating each one of the plurality of the first adapted statistical models and generating each one of the plurality of the second adapted statistical models is based on maximum a posteriori probability adaptation.
  - 13. The method according to claim 10 wherein the plurality of first adapted statistical data is produced by extracting the means vectors from the plurality of first adapted statistical models.
  - 14. The method according to claim 10 wherein the plurality of second adapted statistical data is produced by extracting the means vectors from the plurality of second adapted statistical models.

8. The method according to claim wherein said statistical model is a statistical representation of a plurality of feature vectors extracted from a plurality of audio interactions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nice Ltd
Original Assignee
Nice Systems Limited (Nice Ltd)
Inventors
Ashkenazi, Tzach, Pereg, Oren, LAPERDON, Ronen, Wasserblat, Moshe, David, Ido David

Granted Patent

US 9,093,081 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/270
CPC Class Codes

G10L 25/63 for estimating an emotional...

METHOD AND APPARATUS FOR REAL TIME EMOTION DETECTION IN AUDIO INTERACTIONS

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

55 Citations

14 Claims

Specification

Use Cases

Quick Links

Others

METHOD AND APPARATUS FOR REAL TIME EMOTION DETECTION IN AUDIO INTERACTIONS

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

55 Citations

14 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others