Apparatus and Methods for the Detection of Emotions in Audio Interactions

US 20080040110A1
Filed: 08/08/2005
Published: 02/14/2008
Est. Priority Date: 08/08/2005
Status: Abandoned Application

First Claim

Patent Images

1. A method for detecting an at least one emotional state of an at least one speaker speaking in an at least one tested audio signal having a quality, the method comprising an emotion detection phase, the emotion detection phase comprising:

a feature extraction step for extracting at least two feature vectors, each feature vector extracted from an at least one frame within the at least one tested audio signal;

a first model construction step for constructing a reference voice model from at least two first feature vectors, said model representing the speaker'"'"'s voice in neutral emotional state of the at least one speaker;

a second model construction step for constructing an at least one section voice model from at least two second feature vectors;

a distance determination step for determining an at least one distance between the reference voice model and the at least one section voice model; and

a section emotion score determination step for determining, by using the at least one distance, an at least one emotion score.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An apparatus and method for detecting an emotional state of a speaker participating in an audio signal. The apparatus and method are based on the distance in voice features between a person being in an emotional state and the same person being in a neutral state. The apparatus and method comprise a training phase in which a training feature vector is determined, and an ongoing stage in which the training feature vector is used to determine emotional states in a working environment. Multiple types of emotions can be detected, and the method and apparatus are speaker-independent, i.e., no prior voice sample or information about the speaker is required.

163 Citations

View as Search Results

18 Claims

1. A method for detecting an at least one emotional state of an at least one speaker speaking in an at least one tested audio signal having a quality, the method comprising an emotion detection phase, the emotion detection phase comprising:
- a feature extraction step for extracting at least two feature vectors, each feature vector extracted from an at least one frame within the at least one tested audio signal;
  
  a first model construction step for constructing a reference voice model from at least two first feature vectors, said model representing the speaker'"'"'s voice in neutral emotional state of the at least one speaker;
  
  a second model construction step for constructing an at least one section voice model from at least two second feature vectors;
  
  a distance determination step for determining an at least one distance between the reference voice model and the at least one section voice model; and
  
  a section emotion score determination step for determining, by using the at least one distance, an at least one emotion score.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1 further comprising a global emotion score determination step for detecting an at least one emotional state of the at least one speaker speaking in the at least one tested audio signal based on the at least one emotion score.
  - 3. The method of claim 1 further comprising a training phase, the training phase comprising:
    - a feature extraction step for extracting at least two feature vectors, each feature vector extracted from an at least one frame within an at least one training audio signal having a quality;
      
      a first model construction step for constructing a reference voice model from at least two vectors;
      
      a second model construction step for constructing an at least one section voice model from at least two feature vectors;
      
      a distance determination step for determining an at least one distance between the reference voice model and the at least one section voice model; and
      
      a parameters determination step for determining a trained parameter vector.
  - 4. The method of claim 3 wherein the section emotion scores determination step of the emotion detecting phase uses the trained parameter vector determined by the parameters determination step of the training phase.
  - 5. The method of claim 3 wherein the emotion detection phase or the training phase further comprises a front-end processing step for enhancing the quality of the at least one tested audio signal or the quality of the at least one training audio signal.
  - 6. The method of claim 5 wherein the front-end processing step comprises a silence/voiced/unvoiced classification step for segmenting the at least one tested audio signal or the at least one training audio signal into silent, voiced and unvoiced sections.
  - 7. The method of claim 5 wherein the front-end processing step comprises a speaker segmentation step for segmenting multiple speakers in the at least one tested audio signal or the at least one training audio signal.
  - 8. The method of claim 5 wherein the front-end processing step comprises a compression step or a decompression step for compressing or decompressing the at least one tested audio signal or the at least one training audio signal.
  - 9. The method of claim 1 wherein the method further associates the at least one emotional state found within the at least one tested audio signal with an emotion.

10. An apparatus for detecting an emotional state of an at least one speaker speaking in an at least one audio signal, the apparatus comprises:
- a feature extraction component for extracting at least two feature vectors, each feature vector extracted from an at least one frame within the at least one audio signal;
  
  a model construction component for constructing a model from at least two feature vectors;
  
  a distance determination component for determining a distance between the two models; and
  
  an emotion score determination component for determining, using said distance, an at least one emotion score for the at least one speaker within the at least one audio signal to be in an emotional state.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17)
- - 11. The apparatus of claim 10 further comprising a global emotion score determination component for detecting an at least one emotional state of the at least one speaker speaking in the at least one audio signal based on the at least one emotion score.
  - 12. The apparatus of claim 10 further comprising a training parameter determination component for determining a trained parameter vector to be used by the emotion score determination component.
  - 13. The apparatus of claim 10 further comprising a front-end processing component for enhancing the quality of the at least one audio signal.
  - 14. The apparatus of claim 13 wherein the front-end processing step comprises a silence/voiced/unvoiced classification component for segmenting the at least one audio signal into silent, voiced, and unvoiced sections.
  - 15. The apparatus of claim 13 where the front-end processing step comprises a speaker segmentation component for segmenting multiple speakers in the at least one audio signal.
  - 16. The apparatus of claim 13 wherein the front-end processing component comprises a compression component or a decompression component for compressing or decompressing the at least one audio signal.
  - 17. The apparatus of claim 10 wherein the emotional state is associated with an emotion.

18. A computer readable storage medium containing a set of instructions for a general purpose computer, the set of instructions comprising:
- a feature extraction component for extracting at least two feature vectors, each feature vector extracted from an at least one frame within an at least one audio signal in which an at least one speaker is speaking;
  
  a model construction component for constructing a model from at least two feature vectors;
  
  a distance determination component for determining a distance between the two models; and
  
  an emotion score determination component for determining, using said distance, an at least one emotion score for the at least one speaker within the at least one audio signal to be in an emotional state.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nice Systems Limited (Nice Ltd)
Original Assignee
Nice Systems Limited (Nice Ltd)
Inventors
Wasserblat, Moshe, Pereg, Oren

Application Number

US11/568,048
Publication Number

US 20080040110A1
Time in Patent Office

Days
Field of Search
US Class Current

704/236
CPC Class Codes

G10L 15/02 Feature extraction for spee...

G10L 17/26 Recognition of special voic...

Apparatus and Methods for the Detection of Emotions in Audio Interactions

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

163 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Apparatus and Methods for the Detection of Emotions in Audio Interactions

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

163 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links