System and method for automatic classification of speech based upon affective content

US 6,173,260 B1
Filed: 03/31/1998
Issued: 01/09/2001
Est. Priority Date: 10/29/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method for classifying the affective content of speech, comprising the steps of:

analyzing a portion of a speech signal to determine values for features of said portion of the speech signal which are based upon the pitch of the speech signal;

analyzing said portion of the speech signal to determine the mel-frequency cepstral coefficients for the speech signal, and measuring the difference in said coefficients from frame to frame of said portion of the speech signal to determine changes in the spectral envelope of the speech signal over time;

providing said pitch-based feature values and said changes in the spectral envelope to a classifier; and

labelling said portion of the speech signal as belonging to one of a predetermined set of classes of affective content for which said classifier has been trained.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The classification of speech according to emotional content employs acoustic measures in addition to pitch as classification input. In one embodiment, two different kinds of features in a speech signal are analyzed for classification purposes. One set of features is based on pitch information that is obtained from a speech signal, and the other set of features is based on changes in the spectral shape of the speech signal over time. This latter feature is used to distinguish long, smoothly varying sounds from quickly changing sound, which may indicate the emotional state of the speaker. These changes are determined by means of a low-dimensional representation of the speech signal, such as MFCC or LPC. Additional features of the speech signal, such as energy, can also be employed for classification purposes. Different variations of pitch and spectral shape features can be measured and analyzed, to assist in the classification of individual utterances. In one implementation, the features are measured individually for each of the first, middle and last thirds of an utterance, as well as for the utterance as a whole, to generate multiple sets of data for each utterance.

289 Citations

20 Claims

1. A method for classifying the affective content of speech, comprising the steps of:
- analyzing a portion of a speech signal to determine values for features of said portion of the speech signal which are based upon the pitch of the speech signal;
  
  analyzing said portion of the speech signal to determine the mel-frequency cepstral coefficients for the speech signal, and measuring the difference in said coefficients from frame to frame of said portion of the speech signal to determine changes in the spectral envelope of the speech signal over time;
  
  providing said pitch-based feature values and said changes in the spectral envelope to a classifier; and
  
  labelling said portion of the speech signal as belonging to one of a predetermined set of classes of affective content for which said classifier has been trained.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The method of claim 1 wherein said changes in the spectral envelope include frame-to-frame change in spectral shape over said portion of the speech signal.
  - 3. The method of claim 1 wherein said pitch-based feature values include statistics relating to the magnitude of pitch in said portion of the speech signal.
  - 4. The method of claim 1 wherein said pitch-based feature values include both the frame-to-frame change in pitch over said portion of the speech signal and statistics relating to the magnitude of pitch in said portion of the speech signal.
  - 5. The method of claim 1 wherein said portion of the speech signal comprises a single utterance.
  - 6. The method of claim 1 wherein said portion of the speech signal comprises a single utterance, and further including the steps of dividing each utterance into multiple segments, and performing said analyzing steps for individual ones of said segments.
  - 7. The method of claim 6 wherein said analyzing steps are also performed globally over the entirety of each utterance.
  - 8. The method of claim 6 wherein said segments comprise the first, middle and last thirds of an utterance.
  - 9. The method of claim 1 further including the steps of analyzing said portion of the speech signal to determine values for a third feature of said portion of the speech signal which is based upon the characteristics of the speech signal other than its pitch and its spectral envelope, and providing said third feature values to said classifier in addition to said pitch-based feature values and said changes in the spectral envelope for labelling the affective content of said portion of the speech signal.
  - 10. The method of claim 9 wherein said third feature is the energy of the speech signal.
  - 11. The method of claim 1 wherein said spectral envelope is measured by determining a low-dimensional representation of the speech signal.
  - 12. The method of claim 11 wherein said low-dimensional representation comprises the mel-frequency cepstral coefficients for the speech signal.
  - 13. The method of claim 11 wherein said low-dimensional representation is determined from a linear predictive analysis of the speech signal.

14. A method for classifying the affective content of speech, comprising the steps of:
- analyzing a portion of a speech signal to determine the mel-frequency cepstral coefficients for the speech signal, and measuring the difference in said coefficients from frame to frame of said portion of the speech signal to determine changes in the spectral envelope of the speech signal over time;
  
  analyzing said portion of the speech signal to determine values for other features of said portion of the speech signal which are based upon the characteristics of the speech signal other than its spectral envelope;
  
  providing said changes in the spectral envelope and said values of the other features to a classifier; and
  
  labelling said portion of the speech signal as belonging to one of a predetermined set of classes of affective content for which said classifier has been trained.
- View Dependent Claims (15)
- - 15. The method of claim 14 wherein said changes in the spectral envelope include frame-to-frame change in spectral shape over said portion of the speech signal.

16. A method for classifying the affective content of speech, comprising the steps of:
- analyzing a portion of a speech signal to determine values for multiple features of said portion of the speech signal which are based upon the pitch of the speech signal;
  
  analyzing said portion of the speech signal to determine the mel-frequency cepstral coefficients for the speech signal, and measuring the difference in said coefficients from frame to frame of said portion of the speech signal to determine changes in the spectral envelope of the speech signal over time;
  
  providing said values of the pitch-based features and said envelope changes to a classifier; and
  
  labelling said portion of the speech signal as belonging to one of a predetermined set of classes of affective content for which said classifier has been trained.

17. A system for classifying speech according to its affective content, comprising:
- a pitch analyzer which determines values for features of a portion of a speech signal which are based upon the pitch of the speech signal;
  
  a spectral shape analyzer which determines the mel-frequency cepstral coefficients for the speech signal, and measures the difference in said coefficients from frame to frame of said portion of the speech signal to determine changes in the spectral envelope of the speech signal over time; and
  
  a classifier which receives said pitch-based feature values and said changes in the spectral envelope, and labels said portion of the speech signal as belonging to one of a predetermined set of classes of affective content for which said classifier has been trained.
- View Dependent Claims (18)
- - 18. The system of claim 17 wherein said changes in the spectral envelope include frame-to-frame change in spectral shape over said portion of the speech signal.

19. A system for classifying speech according to its affective content, comprising:
- a spectral shape analyzer which determines the mel-frequency cepstral coefficients for the speech signal, and measures the difference in said coefficients from frame to frame of the speech signal to determine changes in the spectral envelope of a speech signal over time;
  
  a second analyzer which determines values for other features of the speech signal which are based upon the characteristics of the speech signal other than its spectral envelope; and
  
  a classifier which labels the speech signal as belonging to one of a predetermined set of classes of affective content in accordance with said changes in spectral envelope and said other feature values.

20. A system for classifying speech according to its affective content, comprising:
- a pitch analyzer which determines values for multiple features of a speech signal which are based upon the pitch of the speech signal;
  
  a spectral shape analyzer which determines the mel-frequency cepstral coefficients for the speech signal, and measures the difference in said coefficients from frame to frame of the speech signal to determine changes in the spectral envelope of the speech signal over time; and
  
  a classifier which labels speech signal as belonging to one of a predetermined set of classes of affective content in accordance with said pitch-based feature values and said spectral envelope changes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Vulcan Patents LLC
Original Assignee
Interval Research Corporation
Inventors
Slaney, Malcolm
Primary Examiner(s)
Hudspeth, David R.
Assistant Examiner(s)
Storm, Donald L.

Application Number

US09/050,896
Time in Patent Office

1,015 Days
Field of Search

704/231, 704/236, 704/243, 704/246, 704/250, 704/255, 704/276
US Class Current

704/250
CPC Class Codes

G10L 15/1807   using prosody or stress

G10L 17/26   Recognition of special voic...

G10L 2015/227   of the speaker; Human-fact...

System and method for automatic classification of speech based upon affective content

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

289 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

System and method for automatic classification of speech based upon affective content

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

289 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others