Bimodal emotion recognition method and system utilizing a support vector machine

US 8,965,762 B2
Filed: 02/07/2011
Issued: 02/24/2015
Est. Priority Date: 02/16/2007
Status: Active Grant

First Claim

Patent Images

1. A method used for emotion recognition comprising the steps of:

(a) establishing hyperplanes, further comprising the steps of;

(a1) establishing a plurality of training samples; and

(a2) using a means of support vector machine (SVM) to establish the hyperplanes basing upon the plurality of training samples(b) inputting at least two unknown data to be identified while enabling each unknown data to correspond to one of the hyperplanes whereas there are two emotion category being defined in the one of the hyperplanes, and each unknown data being a data selected from an image data and a vocal data;

(c) respectively performing a calculation process, using a computer, upon the at least two unknown data for assigning each with a weight, the calculation process further comprising the steps of;

(c1) basing upon the plurality of training samples used for establishing the one of the hyperplanes to acquire a standard deviation and a mean distance between the plurality of training samples and the one of the hyperplanes;

(c2) respectively calculating feature distances between the one of the hyperplanes and the at least two unknown data to be identified; and

(c3) obtaining the weights of the at least two unknown data by performing a mathematic operation upon the feature distances, the plurality of training samples, the mean distance and the standard deviation, the mathematic operation further comprising the steps of;

obtaining differences between the feature distances and the standard deviation; and

normalizing the differences for obtaining the weights, wherein weights of facial image Z_Fiand weights of vocal data Z_Aiare obtained wherein

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method is disclosed in the present disclosure for recognizing emotion by setting different weights to at least of two kinds of unknown information, such as image and audio information, based on their recognition reliability respectively. The weights are determined by the distance between test data and hyperplane and the standard deviation of training data and normalized by the mean distance between training data and hyperplane, representing the classification reliability of different information. The method recognizes the emotion according to the unidentified information having higher weights while the at least two kinds of unidentified information have different result classified by the hyperplane and correcting wrong classification result of the other unidentified information so as to raise the accuracy while emotion recognition. Meanwhile, the present disclosure also provides a learning step with a characteristic of higher learning speed through an algorithm of iteration.

Citations

28 Claims

1. A method used for emotion recognition comprising the steps of:
- (a) establishing hyperplanes, further comprising the steps of;
  
  (a1) establishing a plurality of training samples; and
  
  (a2) using a means of support vector machine (SVM) to establish the hyperplanes basing upon the plurality of training samples(b) inputting at least two unknown data to be identified while enabling each unknown data to correspond to one of the hyperplanes whereas there are two emotion category being defined in the one of the hyperplanes, and each unknown data being a data selected from an image data and a vocal data;
  
  (c) respectively performing a calculation process, using a computer, upon the at least two unknown data for assigning each with a weight, the calculation process further comprising the steps of;
  
  (c1) basing upon the plurality of training samples used for establishing the one of the hyperplanes to acquire a standard deviation and a mean distance between the plurality of training samples and the one of the hyperplanes;
  
  (c2) respectively calculating feature distances between the one of the hyperplanes and the at least two unknown data to be identified; and
  
  (c3) obtaining the weights of the at least two unknown data by performing a mathematic operation upon the feature distances, the plurality of training samples, the mean distance and the standard deviation, the mathematic operation further comprising the steps of;
  
  obtaining differences between the feature distances and the standard deviation; and
  
  normalizing the differences for obtaining the weights, wherein weights of facial image Z_Fiand weights of vocal data Z_Aiare obtained wherein
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)
- - 2. The method of claim 1, wherein each of the emotion categories is an emotion selected from a group consisting of happiness, sadness, surprise, neutral and anger.
  - 3. The method of claim 1, wherein the establishing of the plurality of training samples further comprises the steps of:
    - (a11) selecting one emotion category out of the two emotion categories;
      
      (a12) acquiring a plurality of feature values according to the selected emotion category so as to form one of the plurality of training samples;
      
      (a13) selecting another emotion category;
      
      (a14) acquiring a plurality of feature values according to the newly selected emotion category so as to form another one of the plurality of training samples; and
      
      (a15) repeating steps (a13) to (a14) and thus forming the plurality of training samples.
  - 4. The method of claim 1, wherein the image data is an image selected from the group consisting of a facial image and a gesture image.
  - 5. The method of claim 1, wherein the image data is comprised of a plurality of feature values, each being defined as a distance between two specific features detected in the image data.
  - 6. The method of claim 1, wherein the vocal data is comprised of a plurality feature values, each being defined as a combination of pitch and energy.
  - 7. The method of claim 1, wherein the acquiring of weights of step (c) further comprises the steps of:
    - (c1) basing on the hyperplanes corresponding to the two unknown data to determine whether the two unknown data are capable of being labeled to a same emotion category; and
      
      (c2) respectively performing the calculation process upon the two unknown data for assigning each with a weight while the two unknown data are not of the same emotion category.
  - 8. The method of claim 1, further comprises a step of:
    - (e) performing a learning process with respect to a new unknown data for updating the hyperplanes, and the step (e) further comprises the steps of;
      
      (e1) acquiring a parameter of the hyperplane to be updated; and
      
      (e2) using feature values detected from the unknown data and the parameter to update the hyperplanes through an algorithm of iteration.
  - 9. The method of claim 1, further comprising the steps of:
    - (a′
      
      ) providing at least two of the plurality of training samples, each being defined in a specified characteristic space established by performing a transformation process upon each training sample with respect to its original space;
      
      (b′
      
      ) establishing at least two of the hyperplanes in the specified characteristic spaces of the at least two of the plurality of training samples, each of the at least two of the hyperplanes capable of defining two emotion categories;
      
      (c′
      
      ) inputting at least two unknown data to be identified in correspondence to the at least two of the hyperplanes, and transforming each unknown data to its corresponding characteristic space by the use of the transformation process while enabling each unknown data to correspond to one emotion category selected from the two emotion categories of the hyperplane corresponding thereto, and each unknown data being a data selected from an image data and a vocal data;
      
      (d′
      
      ) respectively performing a calculation process upon the two unknown data for assigning each with a weight; and
      
      (e′
      
      ) comparing the assigned weight of the two unknown data while using the comparison as base for selecting one emotion category out of those emotion categories as an emotion recognition result.
  - 10. The method of claim 1, further comprises a step of:
    - (f′
      
      ) performing a learning process with respect to a new unknown data for updating the hyperplanes, and the step (f′
      
      ) further comprises the steps of;
      
      (f1′
      
      ) acquiring a parameter of the hyperplane to be updated;
      
      (f2′
      
      ) transforming the new unknown data into its corresponding characteristic space by the use of the transformation process; and
      
      (f3′
      
      ) using feature values detected from the unknown data and the parameter to update the hyperplanes through an algorithm of iteration.(f4′
      
      ) when updating the hyperplane, a critical set is determined by using a fixed number of samples close to the hyperplane, and the critical set is defined by X_i=arg min |w·
      
      X_i+b|, wherein the Xi is a number of the samples;
      
      the w represents a normal vector of the hyperplane; and
      
      the b represents an intercept.
  - 11. The method of claim 9, wherein the transformation process is a Gaussian Kernel transformation.
  - 12. The method of claim 9, wherein each of the emotion categories is an emotion selected from a group consisting of happiness, sadness, surprise, neutral and anger.
  - 13. The method of claim 9, wherein the image data is an image selected from a group consisting of a facial image and a gesture image.
  - 14. The method of claim 9, wherein the image data is comprised of a plurality of feature values, each being defined as a distance between two specific features detected in the image data.
  - 15. The method of claim 9, wherein the vocal data is comprised of a plurality feature values, each being defined as a combination of pitch and energy.
  - 16. The method of claim 9, wherein the calculation process is comprised of the steps of:
    - basing upon the plurality of training samples used for establishing the corresponding hyperplane to acquire the standard deviation and the mean distance between the plurality of training samples and the hyperplane;
      
      respectively calculating feature distances between the hyperplane and the at least two unknown data to be identified; and
      
      obtaining the weights of the at least two unknown data by normalizing the feature distances, the plurality of training samples, the mean distance and the standard deviation.
  - 17. The method of claim 9, wherein the acquiring of weights of step (d′
    - ) further comprises the steps of;
      
      (d1′
      
      ) basing on the hyperplanes corresponding to the two unknown data to determine whether the two unknown data are capable of being labeled to a same emotion category; and
      
      (d2′
      
      ) respectively performing the calculation process upon the two unknown data for assigning each with a weight while the two unknown data are not of the same emotion category.

18. A method used for emotion recognition, comprising the steps of:
- (a) providing at least two training samples, each of the at least two training samples being defined in a specified characteristic space established by performing a transformation process upon the each of the at least two training samples with respect to its original space;
  
  (b) establishing at least two corresponding hyperplanes in the specified characteristic spaces of the at least two training samples, each of the at least two hyperplanes capable of defining two emotion categories;
  
  (c) inputting at least two unknown data to be identified in correspondence to the at least two hyperplanes, and transforming each unknown data to its corresponding characteristic space by the use of the transformation process while enabling each unknown data to correspond to one emotion category selected from the two emotion categories of the each of the at least two hyperplanes corresponding thereto, and each unknown data being a data selected from an image data and a vocal data;
  
  (d) respectively performing a calculation process, using a computer, upon the two unknown data for assigning each with a weight;
  
  (e) comparing the assigned weight of the two unknown data while using the comparison as base for selecting one emotion category out of a plurality of emotion categories as an emotion recognition result; and
  
  (f) performing a learning process with respect to a new unknown data for updating the each of the at least two hyperplanes, and further comprising the steps of;
  
  (f1) acquiring a parameter of the each of the at least two hyperplanes to be updated;
  
  (f2) transforming the new unknown data into its corresponding characteristic space by the use of the transformation process; and
  
  (f3) using feature values detected from the unknown data and the parameter to update the each of the at least two hyperplanes through an algorithm of iteration.(f4) when updating the each of the at least two hyperplanes, a critical set is determined by using a fixed number of samples close to the each of the at least two hyperplanes, and the critical set is defined by, X_i=arg min |w·
  
  X_i+b|, wherein the Xi is a number of the samples, the W represents a normal vector of the each of the at least two hyperplanes, and the b represents an intercept.
- View Dependent Claims (19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
- - 19. The method of claim 18, wherein each of the emotion categories is an emotion selected from a group consisting of happiness, sadness, surprise, neutral and anger.
  - 20. The method of claim 18, wherein step (b) further comprises a step of using a means of support vector machine (SVM) to establish the at least two hyperplanes basing upon the at least two training samples.
  - 21. The method of claim 18, wherein the step (a) further comprises the steps of:
    - (a1) selecting one emotion category out of the two emotion categories;
      
      (a2) acquiring a plurality of feature values according to the selected emotion category so as to form a training sample;
      
      (a3) selecting another emotion category;
      
      (a4) acquiring a plurality of feature values according to the newly selected emotion category so as to form another training sample; and
      
      (a5) repeating steps (a3) and (a4) and thus forming the at least two training samples.
  - 22. The method of claim 18, wherein the image data is an image selected from a group consisting of a facial image and a gesture image.
  - 23. The method of claim 18, wherein the image data is comprised of a plurality of feature values, each of the plurality of feature values being defined as a distance between two specific features detected in the image data.
  - 24. The method of claim 18, wherein the vocal data is comprised of a plurality of feature values, each of the plurality of feature values being defined as a combination of pitch and energy.
  - 25. The method of claim 18, wherein the calculation process of the step (d) further includes the steps of:
    - basing upon the at least two training samples used for establishing the each of the at least two hyperplanes to acquire a standard deviation and a mean distance between the at least two training samples and the each of the at least two hyperplanes;
      
      respectively calculating feature distances between the at least two hyperplanes and the at least two unknown data to be identified; and
      
      obtaining the weights of the at least two unknown data by performing a mathematic operation upon the feature distances, the at least two training samples, the mean distance and the standard deviation.
  - 26. The method of claim 25, wherein the mathematic operation further comprises the steps of:
    - obtaining differences between the feature distances and the standard deviation; and
      
      normalizing the differences for obtaining the weights, wherein weights of facial image Z_Fiand weights of vocal data Z_Aiare obtained wherein
  - 27. The method of claim 18, wherein the step (d) further includes the steps of:
    - (d1) basing on the at least two hyperplanes corresponding to the two unknown data to determine whether the two unknown data are capable of being labeled to a same emotion category; and
      
      (d2) respectively performing the calculation process upon the two unknown data for assigning each with a weight while the two unknown data are not of the same emotion category.
  - 28. The method of claim 18, wherein the transformation process is a Gaussian Kernel transformation.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Industrial Technology Research Institute
Original Assignee
Industrial Technology Research Institute
Inventors
Song, Kai-Tai, Han, Meng-Ju, Hsu, Jing-Huai, Chang, Fuh-Yu, Hong, Jung-Wei
Primary Examiner(s)
WOZNIAK, JAMES S

Application Number

US13/022,418
Publication Number

US 20110141258A1
Time in Patent Office

1,478 Days
Field of Search

704/231, 704/236, 704/270, 382/118
US Class Current

704/236
CPC Class Codes

G06F 18/2411   based on the proximity to a...

G06V 10/764   using classification, e.g. ...

G06V 40/168   Feature extraction; Face re...

G06V 40/171   Local features and componen...

G06V 40/175   Static expression

G10L 17/26   Recognition of special voic...

G10L 25/63   for estimating an emotional...

Bimodal emotion recognition method and system utilizing a support vector machine

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

28 Claims

Specification

Solutions

Use Cases

Quick Links

Bimodal emotion recognition method and system utilizing a support vector machine

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

28 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links