System and method for automatic classification of speech based upon affective content
First Claim
1. A method for classifying the affective content of speech, comprising the steps of:
- analyzing a portion of a speech signal to determine values for features of said portion of the speech signal which are based upon the pitch of the speech signal;
analyzing said portion of the speech signal to determine the mel-frequency cepstral coefficients for the speech signal, and measuring the difference in said coefficients from frame to frame of said portion of the speech signal to determine changes in the spectral envelope of the speech signal over time;
providing said pitch-based feature values and said changes in the spectral envelope to a classifier; and
labelling said portion of the speech signal as belonging to one of a predetermined set of classes of affective content for which said classifier has been trained.
2 Assignments
0 Petitions
Accused Products
Abstract
The classification of speech according to emotional content employs acoustic measures in addition to pitch as classification input. In one embodiment, two different kinds of features in a speech signal are analyzed for classification purposes. One set of features is based on pitch information that is obtained from a speech signal, and the other set of features is based on changes in the spectral shape of the speech signal over time. This latter feature is used to distinguish long, smoothly varying sounds from quickly changing sound, which may indicate the emotional state of the speaker. These changes are determined by means of a low-dimensional representation of the speech signal, such as MFCC or LPC. Additional features of the speech signal, such as energy, can also be employed for classification purposes. Different variations of pitch and spectral shape features can be measured and analyzed, to assist in the classification of individual utterances. In one implementation, the features are measured individually for each of the first, middle and last thirds of an utterance, as well as for the utterance as a whole, to generate multiple sets of data for each utterance.
289 Citations
20 Claims
-
1. A method for classifying the affective content of speech, comprising the steps of:
-
analyzing a portion of a speech signal to determine values for features of said portion of the speech signal which are based upon the pitch of the speech signal;
analyzing said portion of the speech signal to determine the mel-frequency cepstral coefficients for the speech signal, and measuring the difference in said coefficients from frame to frame of said portion of the speech signal to determine changes in the spectral envelope of the speech signal over time;
providing said pitch-based feature values and said changes in the spectral envelope to a classifier; and
labelling said portion of the speech signal as belonging to one of a predetermined set of classes of affective content for which said classifier has been trained. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method for classifying the affective content of speech, comprising the steps of:
-
analyzing a portion of a speech signal to determine the mel-frequency cepstral coefficients for the speech signal, and measuring the difference in said coefficients from frame to frame of said portion of the speech signal to determine changes in the spectral envelope of the speech signal over time;
analyzing said portion of the speech signal to determine values for other features of said portion of the speech signal which are based upon the characteristics of the speech signal other than its spectral envelope;
providing said changes in the spectral envelope and said values of the other features to a classifier; and
labelling said portion of the speech signal as belonging to one of a predetermined set of classes of affective content for which said classifier has been trained. - View Dependent Claims (15)
-
-
16. A method for classifying the affective content of speech, comprising the steps of:
-
analyzing a portion of a speech signal to determine values for multiple features of said portion of the speech signal which are based upon the pitch of the speech signal;
analyzing said portion of the speech signal to determine the mel-frequency cepstral coefficients for the speech signal, and measuring the difference in said coefficients from frame to frame of said portion of the speech signal to determine changes in the spectral envelope of the speech signal over time;
providing said values of the pitch-based features and said envelope changes to a classifier; and
labelling said portion of the speech signal as belonging to one of a predetermined set of classes of affective content for which said classifier has been trained.
-
-
17. A system for classifying speech according to its affective content, comprising:
-
a pitch analyzer which determines values for features of a portion of a speech signal which are based upon the pitch of the speech signal;
a spectral shape analyzer which determines the mel-frequency cepstral coefficients for the speech signal, and measures the difference in said coefficients from frame to frame of said portion of the speech signal to determine changes in the spectral envelope of the speech signal over time; and
a classifier which receives said pitch-based feature values and said changes in the spectral envelope, and labels said portion of the speech signal as belonging to one of a predetermined set of classes of affective content for which said classifier has been trained. - View Dependent Claims (18)
-
-
19. A system for classifying speech according to its affective content, comprising:
-
a spectral shape analyzer which determines the mel-frequency cepstral coefficients for the speech signal, and measures the difference in said coefficients from frame to frame of the speech signal to determine changes in the spectral envelope of a speech signal over time;
a second analyzer which determines values for other features of the speech signal which are based upon the characteristics of the speech signal other than its spectral envelope; and
a classifier which labels the speech signal as belonging to one of a predetermined set of classes of affective content in accordance with said changes in spectral envelope and said other feature values.
-
-
20. A system for classifying speech according to its affective content, comprising:
-
a pitch analyzer which determines values for multiple features of a speech signal which are based upon the pitch of the speech signal;
a spectral shape analyzer which determines the mel-frequency cepstral coefficients for the speech signal, and measures the difference in said coefficients from frame to frame of the speech signal to determine changes in the spectral envelope of the speech signal over time; and
a classifier which labels speech signal as belonging to one of a predetermined set of classes of affective content in accordance with said pitch-based feature values and said spectral envelope changes.
-
Specification