Detecting emotions using voice signal analysis
First Claim
1. A method of detecting an emotional state of a telephone caller, the method comprising:
- providing a speech signal from a telephone caller;
dividing the speech signal into at least one of segments, frames, and subframes;
extracting at least one acoustic feature from the speech signal;
calculating statistics from the at least one acoustic feature;
classifying the speech with at least one neural network classifier as belonging to at least one emotional state; and
storing in memory and outputting in a human-recognizable format an indication of the at least one emotional state, wherein the speech is classified by a classifier taught to recognize at least one emotional state from a finite number of emotional states; and
wherein the speech is classified as emotional or non-emotional.
3 Assignments
0 Petitions
Accused Products
Abstract
A system and method are provided for detecting emotional states using statistics. First, a speech signal is received. At least one acoustic parameter is extracted from the speech signal. Then statistics or features from samples of the voice are calculated from extracted speech parameters. The features serve as inputs to a classifier, which can be a computer program, a device or both. The classifier assigns at least one emotional state from a finite number of possible emotional states to the speech signal. The classifier also estimates the confidence of its decision. Features that are calculated may include a maximum value of a fundamental frequency, a standard deviation of the fundamental frequency, a range of the fundamental frequency, a mean of the fundamental frequency, and a variety of other statistics.
77 Citations
18 Claims
-
1. A method of detecting an emotional state of a telephone caller, the method comprising:
-
providing a speech signal from a telephone caller;
dividing the speech signal into at least one of segments, frames, and subframes;
extracting at least one acoustic feature from the speech signal;
calculating statistics from the at least one acoustic feature;
classifying the speech with at least one neural network classifier as belonging to at least one emotional state; and
storing in memory and outputting in a human-recognizable format an indication of the at least one emotional state, wherein the speech is classified by a classifier taught to recognize at least one emotional state from a finite number of emotional states; and
wherein the speech is classified as emotional or non-emotional. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for classifying speech contained in a telephone call, the system comprising:
-
a computer system comprising a central processing unit, an input device, at least one memory for storing data indicative of a speech signal, and an output device;
logic for receiving and analyzing a speech signal of a telephone caller;
logic for dividing the speech signal of the telephone caller;
logic for extracting at least one feature from the speech signal of the telephone caller;
logic for calculating statistics of the speech of the telephone caller;
logic for at least one neural network for classifying the speech of the telephone caller as belonging to at least one of a finite number of emotional states; and
logic for storing in memory and outputting an indication of the at least one emotional state of the telephone caller;
wherein the logic for at least one neural network comprises at least one three-layer neural network. - View Dependent Claims (7, 8, 9, 10, 11)
-
-
12. A method of recognizing emotional states in a voice of a telephone caller, the method comprising:
-
providing a first plurality of voice samples;
obtaining a second plurality of voice samples of a telephone caller, from a telephone call;
identifying each sample of said pluralities of samples as belonging to a predominant emotional state;
dividing each sample into at least one of frames, subframes, and segments;
extracting at least one acoustic feature for each sample of the pluralities of samples;
calculating statistics of the speech samples from the at least one feature;
classifying an emotional state in the first plurality of samples with at least one neural network;
training the at least one neural network to recognize an emotional state from the statistics by comparing the results of identifying and classifying for the first plurality of samples;
classifying an emotion in the second plurality of voice samples obtained from a telephone call with the at least one trained neural network;
storing in memory and outputting in a human-recognizable format an indication of the emotional state of the telephone caller; and
routing a call containing said speech signal to a predetermined location according to the at least one classified emotional state. - View Dependent Claims (13)
-
-
14. A system for detecting an emotional state of a telephone caller from a voice signal of a telephone call, the system comprising:
-
a speech reception device;
at least one computer connected to the speech reception device;
at least one memory operably connected to the at least one computer;
a computer program including at least one neural network for dividing the voice signal into a plurality of segments, and for analyzing the segments according to features of the segments to detect the emotional state in the voice signal; and
an output device coupled to the computer for notifying a user of the emotional state of the telephone caller detected in the voice signal; and
wherein the at least one neural network comprises at least one three-layer neural network. - View Dependent Claims (15, 16, 17, 18)
-
Specification