Emotion recognition in video conferencing
First Claim
1. A method comprising:
- receiving a video including a sequence of images and an audio stream from a video conference between a first user interface and a second user interface;
detecting a face of an individual in one or more of the images;
recognizing a speech emotion in the audio stream;
generating a communication bearing data associated with the speech emotion;
transmitting the communication bearing data over a communications network; and
switching the video conference from between the first user interface and the second user interface to between the first user interface and a third user interface responsive to the communication bearing data.
4 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for videoconferencing include recognition of emotions related to one videoconference participant such as a customer. This ultimately enables another videoconference participant, such as a service provider or supervisor, to handle angry, annoyed, or distressed customers. One example method includes the steps of receiving a video that includes a sequence of images, detecting at least one object of interest (e.g., a face), locating feature reference points of the at least one object of interest, aligning a virtual face mesh to the at least one object of interest based on the feature reference points, finding over the sequence of images at least one deformation of the virtual face mesh that reflect face mimics, determining that the at least one deformation refers to a facial emotion selected from a plurality of reference facial emotions, and generating a communication bearing data associated with the facial emotion.
-
Citations
20 Claims
-
1. A method comprising:
-
receiving a video including a sequence of images and an audio stream from a video conference between a first user interface and a second user interface; detecting a face of an individual in one or more of the images; recognizing a speech emotion in the audio stream; generating a communication bearing data associated with the speech emotion; transmitting the communication bearing data over a communications network; and switching the video conference from between the first user interface and the second user interface to between the first user interface and a third user interface responsive to the communication bearing data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computing device comprising:
-
at least one processor; and a memory storing processor-executable codes, which, when implemented by the at least one processor, cause the computing device to; receive a video including a sequence of images and an audio stream from a video conference between a first user interface and a second user interface; detect a face of an individual in one or more of the images; recognize a speech emotion in the audio stream; generate a communication bearing data associated with the speech emotion; transmit the communication bearing data over a communications network; and switch the video conference from between the first user interface and the second user interface to between the first user interface and a third user interface responsive to the communication bearing data. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
Specification