System, method and article of manufacture for detecting emotion in voice signals through analysis of a plurality of voice signal parameters

US 6,151,571 A
Filed: 08/31/1999
Issued: 11/21/2000
Est. Priority Date: 08/31/1999
Status: Expired due to Term

First Claim

Patent Images

1. A method for monitoring a conversation between a pair of speakers for detecting an emotion of at least one of the speakers using voice analysis comprising the steps of:

(a) receiving a voice signal representing voices of speakers in a conversation;

(b) extracting at least one feature of the voice signal selected from a group of features consisting of a maximum value of a fundamental frequency, a standard deviation of the fundamental frequency, a range of the fundamental frequency, a mean of the fundamental frequency, a mean of a bandwidth of a first formant, a mean of a bandwidth of a second formant, a standard deviation of energy, a speaking rate, a slope of the fundamental frequency, a maximum value of the first formant, a maximum value of the energy, a range of the energy, a range of the second formant, and a range of the first formant;

(c) determining an emotion associated with the voice signal based on the extracted feature;

(d) determining whether the emotion matches a negative emotion selected from a predefined group of negative emotions consisting of anger, sadness and fear; and

(e) outputting the determined emotion to a third party during the conversation if the emotion matches one of the negative emotions.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for monitoring a conversation between a pair of speakers for detecting an emotion of at least one of the speakers is provided. First, a voice signal is received after which a particular feature is extracted from the voice signal. Next, an emotion associated with the voice signal is determined based on the extracted feature. The emotion is screened and feedback is provided only if the emotion is determined to be a negative emotion selected from the group of negative emotions consisting of anger, sadness, and fear. Such determined negative emotion is then outputted to a third party during the conversation.

477 Citations

20 Claims

1. A method for monitoring a conversation between a pair of speakers for detecting an emotion of at least one of the speakers using voice analysis comprising the steps of:
- (a) receiving a voice signal representing voices of speakers in a conversation;
  
  (b) extracting at least one feature of the voice signal selected from a group of features consisting of a maximum value of a fundamental frequency, a standard deviation of the fundamental frequency, a range of the fundamental frequency, a mean of the fundamental frequency, a mean of a bandwidth of a first formant, a mean of a bandwidth of a second formant, a standard deviation of energy, a speaking rate, a slope of the fundamental frequency, a maximum value of the first formant, a maximum value of the energy, a range of the energy, a range of the second formant, and a range of the first formant;
  
  (c) determining an emotion associated with the voice signal based on the extracted feature;
  
  (d) determining whether the emotion matches a negative emotion selected from a predefined group of negative emotions consisting of anger, sadness and fear; and
  
  (e) outputting the determined emotion to a third party during the conversation if the emotion matches one of the negative emotions.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. A method as recited in claim 1, wherein at least two features of the voice signal selected from the group of features are extracted.
  - 3. A method as recited in claim 1 wherein the third party is a manager and the conversation is between a customer and an employee subordinate to the manager.
  - 4. A method as recited in claim 1, wherein the features that are extracted are the maximum value of the fundamental frequency, the standard deviation of the fundamental frequency, the range of the fundamental frequency, the mean of the fundamental frequency, the mean of the bandwidth of the first formant, the mean of the bandwidth of the second formant, the standard deviation of energy, and the speaking rate.
  - 5. A method as recited in claim 4, wherein the extracted features further include the slope of the fundamental frequency and the maximum value of the first formant.
  - 6. A method as recited in claim 1, wherein the features extracted include the maximum value of the fundamental frequency, the standard deviation of the fundamental frequency, the range of the fundamental frequency, the mean of the fundamental frequency, the mean of the bandwidth of the first formant, the mean of the bandwidth of the second formant, the standard deviation of energy, the speaking rate, the slope of the fundamental frequency, the maximum value of the first formant, the maximum value of the energy, the range of the energy, the range of the second formant, and the range of the first formant.
  - 7. A method as recited in claim 1, wherein the voice signal is received from an emergency response system.
  - 8. A method as recited in claim 7, wherein the third party is a member of an emergency response team.

9. A computer program embodied on a computer readable medium for monitoring a conversation between a pair of speakers for detecting an emotion of at least one of the speakers using voice analysis comprising:
- (a) a code segment that receives a voice signal representing voices of speakers in a conversation;
  
  (b) a code segment that extracts at least one feature of the voice signal selected from a group of features consisting of a maximum value of a fundamental frequency, a standard deviation of the fundamental frequency, a range of the fundamental frequency, a mean of the fundamental frequency, a mean of a bandwidth of a first formant, a mean of a bandwidth of a second formant, a standard deviation of energy, a speaking rate, a slope of the fundamental frequency, a maximum value of the first formant, a maximum value of the energy, a range of the energy, a range of the second formant, and a range of the first formant;
  
  (c) a code segment that determines an emotion associated with the voice signal based on the extracted feature;
  
  (d) a code segment that determines whether the emotion matches a negative emotion selected from a predefined group of negative emotions consisting of anger, sadness and fear; and
  
  (e) a code segment that outputs the determined emotion to a third party during the conversation if the emotion matches one of the negative emotions.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. A computer program as recited in claim 9, wherein at least two features of the voice signal selected from the group of features are extracted.
  - 11. A computer program as recited in claim 9, wherein the third party is a manager and the conversation is between a customer and an employee subordinate to the manager.
  - 12. A computer program as recited in claim 9, the features that are extracted are the maximum value of the fundamental frequency, the standard deviation of the fundamental frequency, the range of the fundamental frequency, the mean of the fundamental frequency, the mean of the bandwidth of the first formant, the mean of the bandwidth of the second formant, the standard deviation of energy, and the speaking rate.
  - 13. A computer program as recited in claim 12, wherein the extracted features further include the slope of the fundamental frequency and the maximum value of the first formant.
  - 14. A computer program as recited in claim 9, wherein the features extracted include the maximum value of the fundamental frequency, the standard deviation of the fundamental frequency, the range of the fundamental frequency, the mean of the fundamental frequency, the mean of the bandwidth of the first formant, the mean of the bandwidth of the second formant, the standard deviation of energy, the speaking rate, the slope of the fundamental frequency, the maximum value of the first formant, the maximum value of the energy, the range of the energy, the range of the second formant, and the range of the first formant.

15. A system for monitoring a conversation between a pair of speakers for detecting an emotion of at least one of the speakers using voice analysis comprising:
- (a) logic that receives a voice signal representing voices of speakers in a conversation;
  
  (b) logic that extracts at least one feature of the voice signal selected from a group of features consisting of a maximum value of a fundamental frequency, a standard deviation of the fundamental frequency, a range of the fundamental frequency, a mean of the fundamental frequency, a mean of a bandwidth of a first formant, a mean of a bandwidth of a second formant, a standard deviation of energy, a speaking rate, a slope of the fundamental frequency, a maximum value of the first formant, a maximum value of the energy, a range of the energy, a range of the second formant, and a range of the first formant;
  
  (c) logic that determines an emotion associated with the voice signal based on the extracted feature;
  
  (d) a code segment that determines whether the emotion matches a negative emotion selected from a predefined group of negative emotions consisting of anger, sadness and fear; and
  
  (e) logic that outputs the determined emotion to a third party during the conversation if the emotion matches one of the negative emotions.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. A system as recited in claim 15, wherein at least two features of the voice signal selected from the group of features are extracted.
  - 17. A system as recited in claim 15, wherein the third party is a manager and the conversation is between a customer and an employee subordinate to the manager.
  - 18. A system as recited in claim 15, the features that are extracted are the maximum value of the fundamental frequency, the standard deviation of the fundamental frequency, the range of the fundamental frequency, the mean of the fundamental frequency, the mean of the bandwidth of the first formant, the mean of the bandwidth of the second formant, the standard deviation of energy, and the speaking rate.
  - 19. A system as recited in claim 18, wherein the extracted features further include the slope of the fundamental frequency and the maximum value of the first formant.
  - 20. A system as recited in claim 15, wherein the features extracted include the maximum value of the fundamental frequency, the standard deviation of the fundamental frequency, the range of the fundamental frequency, the mean of the fundamental frequency, the mean of the bandwidth of the first formant, the mean of the bandwidth of the second formant, the standard deviation of energy, the speaking rate, the slope of the fundamental frequency, the maximum value of the first formant, the maximum value of the energy, the range of the energy, the range of the second formant, and the range of the first formant.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Accenture Global Services Limited (Accenture PLC)
Original Assignee
Andersen Consulting (Accenture PLC)
Inventors
Pertrushin, Valery A.
Primary Examiner(s)
Dorvil, Richemond

Application Number

US09/388,027
Time in Patent Office

448 Days
Field of Search

704/270, 704/272, 704/275, 704/200, 704/207, 704/205, 704/209
US Class Current

704/209
CPC Class Codes

G10L 17/26 Recognition of special voic...

G10L 25/90 Pitch determination of spee...

System, method and article of manufacture for detecting emotion in voice signals through analysis of a plurality of voice signal parameters

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

477 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

System, method and article of manufacture for detecting emotion in voice signals through analysis of a plurality of voice signal parameters

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

477 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links