Detecting emotions using voice signal analysis

US 7,627,475 B2
Filed: 03/08/2007
Issued: 12/01/2009
Est. Priority Date: 08/31/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A method of detecting emotional states of telephone callers, the method comprising:

providing speech signals from telephone callers;

dividing the speech signals into at least one of segments, frames, and subframes;

extracting acoustic features from the speech signals;

calculating statistics from the acoustic features;

classifying the speech with at least one neural network classifier as belonging to at least one emotional state;

storing the speech signals and the emotional states in a storage medium, in a manner to allow later retrieval of the stored speech signals and emotional states;

outputting in a human-recognizable format an indication of the at least one emotional state;

wherein the speech is classified by a classifier taught to recognize at least one emotional state from a finite number of emotional states;

wherein the speech is classified as emotional or non-emotional; and

further comprising routing calls containing said speech signals to at least one predetermined location according to the at least one classified emotional state.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system and method are provided for detecting emotional states using statistics. First, a speech signal is received. At least one acoustic parameter is extracted from the speech signal. Then statistics or features from samples of the voice are calculated from extracted speech parameters. The features serve as inputs to a classifier, which can be a computer program, a device or both. The classifier assigns at least one emotional state from a finite number of possible emotional states to the speech signal. The classifier also estimates the confidence of its decision. Features that are calculated may include a maximum value of a fundamental frequency, a standard deviation of the fundamental frequency, a range of the fundamental frequency, a mean of the fundamental frequency, and a variety of other statistics.

137 Citations

View as Search Results

25 Claims

1. A method of detecting emotional states of telephone callers, the method comprising:
- providing speech signals from telephone callers;
  
  dividing the speech signals into at least one of segments, frames, and subframes;
  
  extracting acoustic features from the speech signals;
  
  calculating statistics from the acoustic features;
  
  classifying the speech with at least one neural network classifier as belonging to at least one emotional state;
  
  storing the speech signals and the emotional states in a storage medium, in a manner to allow later retrieval of the stored speech signals and emotional states;
  
  outputting in a human-recognizable format an indication of the at least one emotional state;
  
  wherein the speech is classified by a classifier taught to recognize at least one emotional state from a finite number of emotional states;
  
  wherein the speech is classified as emotional or non-emotional; and
  
  further comprising routing calls containing said speech signals to at least one predetermined location according to the at least one classified emotional state.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the speech is classified as at least one of angry, sad, happy, afraid and neutral.
  - 3. The method of claim 1, wherein the at least one neural network is taught to recognize an emotional state by dividing speech samples into training and testing segments, and wherein an algorithm for recognizing an emotional state is adjusted by comparing a classification from the neural network to a classification by at least one person.
  - 4. The method of claim 1, wherein the calls containing the speech signals are routed to at least one location selected from the group consisting of a voice-mail center, a call center, an e-mail destination, a customer service center, a manager, and emergency response personnel.
  - 5. The method of claim 1, wherein the storage medium is random access memory.
  - 6. The method of claim 1, wherein the storage medium is a hard drive.
  - 7. The method of claim 1, further comprising annotating and organizing the speech signals and emotional states based on the emotional content.

8. A system for classifying speech contained in telephone calls, the system comprising:
- a computer system comprising a central processing unit, an input device, at least one random access memory for storing data indicative of speech signals, and an output device;
  
  logic for receiving and analyzing speech signals of telephone callers;
  
  logic for dividing the speech signals of the telephone callers;
  
  logic for extracting at least one feature from the speech signals of the telephone callers;
  
  logic for calculating statistics of the speech of the telephone callers;
  
  logic for at least one neural network for classifying the speech of the telephone callers as belonging to at least one of a finite number of emotional states; and
  
  logic for storing the speech signals and the emotional states in a storage medium, in a manner to allow later retrieval of the stored speech signals and emotional states;
  
  logic for outputting an indication of the at least one emotional state of the telephone callers;
  
  wherein the logic for at least one neural network comprises at least one three-layer neural network; and
  
  further comprising logic for routing a call containing said speech signals to at least one predetermined location according to the at least one classified emotional state.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the logic classifies the speech as one of emotional and non-emotional.
  - 10. The system of claim 8, wherein the logic classifies the speech as at least one of angry, sad, happy, afraid and normal.
  - 11. The system of claim 8, wherein the logic routes the calls containing the speech signals to at least one location selected from the group consisting of a voice-mail center, a call center, an e-mail destination, a customer service center, a manager, and emergency response personnel.
  - 12. The system of claim 8, wherein the storage medium is random access memory.
  - 13. The system of claim 8, wherein the storage medium is a hard drive.
  - 14. The system of claim 8, further comprising logic for annotating and organizing the speech signals and emotional states based on the emotional content.

15. A method of recognizing emotional states in a voice of a telephone caller, the method comprising:
- providing a first plurality of voice samples;
  
  obtaining a second plurality of voice samples of a telephone caller, from a telephone call;
  
  identifying each sample of said pluralities of samples as belonging to a predominant emotional state;
  
  dividing each sample into at least one of frames, subframes, and segments;
  
  extracting at least one acoustic feature for each sample of the pluralities of samples;
  
  calculating statistics of the speech samples from the at least one feature;
  
  classifying an emotional state in the first plurality of samples with at least one neural network;
  
  training the at least one neural network to recognize an emotional state from the statistics by comparing the results of identifying and classifying for the first plurality of samples;
  
  classifying an emotion in the second plurality of voice samples obtained from a telephone call with the at least one trained neural network;
  
  storing the voice samples and the emotional states in a storage medium, in a manner to allow later retrieval of the stored voice samples and emotional states;
  
  outputting in a human-recognizable format an indication of the emotional state of the telephone caller; and
  
  routing the call containing said voice samples to at least one predetermined location according to the at least one classified emotional state.
- View Dependent Claims (16, 17, 18, 19)
- - 16. The method of claim 15, wherein the calls containing the voice samples are routed to at least one location selected from the group consisting of a voice-mail center, a call center, an e-mail destination, a customer service center, a manager, and emergency response personnel.
  - 17. The method of claim 15, wherein the storage medium is random access memory.
  - 18. The method of claim 15, wherein the storage medium is a hard drive.
  - 19. The method of claim 15, further comprising annotating and organizing the voice samples and emotional states based on the emotional content.

20. A system for detecting emotional states of telephone callers from speech signals of telephone calls, the system comprising:
- a speech reception device;
  
  at least one computer connected to the speech reception device;
  
  at least one random access memory operably connected to the at least one computer;
  
  a computer program including at least one neural network for dividing the speech signals into a plurality of segments, and for analyzing the segments according to features of the segments to detect the emotional state in the speech signals, and storing the speech signals and the emotional states in a storage medium, in a manner to allow later retrieval of the stored speech signals and emotional states and outputting in a human-recognizable format an indication of the emotional state of the speech signals;
  
  an output device coupled to the computer for notifying a user of the emotional states of the telephone callers detected in the speech signals;
  
  wherein the at least one neural network comprises at least one three-layer neural network; and
  
  further comprising logic for routing a call containing said speech signals to at least one predetermined location according to the at least one classified emotional state.
- View Dependent Claims (21, 22, 23, 24, 25)
- - 21. The system of claim 20, further comprising a set of predetermined responses to persons voicing a speech signal displaying a particular emotional state.
  - 22. The system of claim 20, wherein the logic routes the calls containing the speech signals to at least one location selected from the group consisting of a voice-mail center, a call center, an e-mail destination, a customer service center, a manager, and emergency response personnel.
  - 23. The system of claim 20, wherein the storage medium is random access memory.
  - 24. The system of claim 20, wherein the storage medium is a hard drive.
  - 25. The system of claim 20, further comprising logic for annotating and organizing the speech signals and emotional states based on the emotional content.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Accenture Global Services Limited (Accenture PLC)
Original Assignee
Accenture LLP (Accenture PLC)
Inventors
Petrushin, Valery A.
Primary Examiner(s)
Sked; Matthew J

Application Number

US11/716,240
Publication Number

US 20070162283A1
Time in Patent Office

999 Days
Field of Search

None
US Class Current

704/270
CPC Class Codes

G10L 17/26   Recognition of special voic...

G10L 25/30   using neural networks

H04M 3/436   Arrangements for screening ...

H04M 3/533   Voice mail systems

Detecting emotions using voice signal analysis

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

137 Citations

25 Claims

Specification

Solutions

Use Cases

Quick Links

Detecting emotions using voice signal analysis

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

137 Citations

25 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links