Emotion recognition in video conferencing

US 9,576,190 B2
Filed: 03/18/2015
Issued: 02/21/2017
Est. Priority Date: 03/18/2015
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for video conferencing, the method comprising:

receiving a video including a sequence of images;

detecting at least one object of interest in one or more of the images;

locating feature reference points of the at least one object of interest;

aligning a virtual face mesh to the at least one object of interest in one or more of the images based at least in part on the feature reference points;

finding over the sequence of images at least one deformation of the virtual face mesh, wherein the at least one deformation is associated with at least one face mimic;

determining that the at least one deformation refers to a facial emotion selected from a plurality of reference facial emotions;

generating an emotional status of an individual based on the facial emotion selected from the plurality of reference facial emotions;

determining that the facial emotion is a negative facial emotion; and

generating a communication bearing data associated with the negative facial emotion only where the facial emotion is determined to be a negative facial emotion.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods and systems for videoconferencing include recognition of emotions related to one videoconference participant such as a customer. This ultimately enables another videoconference participant, such as a service provider or supervisor, to handle angry, annoyed, or distressed customers. One example method includes the steps of receiving a video that includes a sequence of images, detecting at least one object of interest (e.g., a face), locating feature reference points of the at least one object of interest, aligning a virtual face mesh to the at least one object of interest based on the feature reference points, finding over the sequence of images at least one deformation of the virtual face mesh that reflect face mimics, determining that the at least one deformation refers to a facial emotion selected from a plurality of reference facial emotions, and generating a communication bearing data associated with the facial emotion.

Citations

22 Claims

1. A computer-implemented method for video conferencing, the method comprising:
- receiving a video including a sequence of images;
  
  detecting at least one object of interest in one or more of the images;
  
  locating feature reference points of the at least one object of interest;
  
  aligning a virtual face mesh to the at least one object of interest in one or more of the images based at least in part on the feature reference points;
  
  finding over the sequence of images at least one deformation of the virtual face mesh, wherein the at least one deformation is associated with at least one face mimic;
  
  determining that the at least one deformation refers to a facial emotion selected from a plurality of reference facial emotions;
  
  generating an emotional status of an individual based on the facial emotion selected from the plurality of reference facial emotions;
  
  determining that the facial emotion is a negative facial emotion; and
  
  generating a communication bearing data associated with the negative facial emotion only where the facial emotion is determined to be a negative facial emotion.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 2. The method of claim 1, wherein the determining that of the at least one deformation refers to the facial emotion selected from the plurality of reference facial emotions includes:
    - comparing the at least one deformation of the virtual face mesh to reference facial parameters of the plurality of reference facial emotions; and
      
      selecting the facial emotion based on the comparison of the at least one deformation of the virtual face mesh to the reference facial parameters of the plurality of reference facial emotions.
  - 3. The method of claim 2, wherein the comparing of the at least one deformation of the virtual face mesh to reference facial parameters comprises applying a convolution neural network.
  - 4. The method of claim 2, wherein the comparing of the at least one deformation of the virtual face mesh to reference facial parameters comprises applying a state vector machine.
  - 5. The method of claim 1, further comprising establishing a one-way or two way videoconference between a service provider and a customer, wherein the video is captured on a customer side.
  - 6. The method of claim 5, further comprising transmitting the communication over a communications network to a third party.
  - 7. The method of claim 6, further comprising allowing the third party to enter into the videoconference between the customer and the service provider, if the facial emotion associated with the at least one deformation of the virtual face mesh relates to a negative facial emotion.
  - 8. The method of claim 1, further comprising transmitting and presenting the communication to a customer service representative or a service provider.
  - 9. The method of claim 1, wherein the at least one object of interest includes a face of an individual.
  - 10. The method of claim 1, wherein the feature reference points include facial landmarks.
  - 11. The method of claim 10, wherein the feature reference points include one or more facial landmarks indicating at least one of the following:
    - an eyebrows vertical position, an eyes vertical position, an eyes width, an eyes height, an eye separation distance, a nose vertical position, nose pointing up, a mouth vertical position, a mouth width, a chin width, a upper lip raiser, a jaw drop, a lip stretcher, a left brow lowerer, a right brow lowerer, a lip comer depressor, and an outer brow raiser.
  - 12. The method of claim 1, further comprising receiving a request to determine the facial emotion of a video conferencing participant.
  - 13. The method of claim 1, wherein the detecting of the at least one object of interest includes applying a Viola-Jones algorithm to the images.
  - 14. The method of claim 1, wherein the locating of the feature reference points includes applying an Active Shape Model algorithm to areas of the images associated with the at least one object of interest.
  - 15. The method of claim 1, wherein the aligning of the virtual face mesh is based on shape units associated with a face shape of the at least one object of interest.
  - 16. The method of claim 15, further comprising:
    - estimating intensities of the shape units associated with the face shape;
      
      estimating intensities of action units associated with the at least one face mimic; and
      
      estimating rotations of the virtual face mesh around three orthogonal axes and its translations along the axes.
  - 17. The method of claim 1, wherein the detecting of the at least one object of interest in one or more of the images is based on a user input.
  - 18. The method of claim 1, wherein the plurality of facial emotions include at least a neutral facial emotion, a positive facial emotion, and a negative facial emotion.
  - 19. The method of claim 18, wherein the negative facial emotion includes at least one of anger, stress, frustration, embarrassment, irritation, and annoyance.
  - 20. The method of claim 1, further comprising:
    - detecting one or more gestures;
      
      determining that the one or more gestures refer to a predetermined emotion; and
      
      generating an emotional status of an individual based on the facial emotion and determination that the one or more gestures refer to the predetermined emotion.

21. A system, comprising:
- a computing device including at least one processor and a memory storing processor-executable codes, which, when implemented by the at least one processor, cause to perform the steps of;
  
  receiving a video including a sequence of images;
  
  detecting at least one object of interest in one or more of the images;
  
  locating feature reference points of the at least one object of interest;
  
  aligning a virtual face mesh to the at least one object of interest in one or more of the images based at least in part on the feature reference points;
  
  finding over the sequence of images at least one deformation of the virtual face mesh, wherein the at least one deformation is associated with at least one face mimic;
  
  determining that the at least one deformation refers to a facial emotion selected from a plurality of reference facial emotions;
  
  generating an emotional status of an individual based on the facial emotion selected from the plurality of reference facial emotions;
  
  determining that the facial emotion is a negative facial emotion; and
  
  generating a communication bearing data associated with the negative facial emotion only where the facial emotion is determined to be a negative facial emotion.

22. A non-transitory processor-readable medium having instructions stored thereon, which when executed by one or more processors, cause the one or more processors to implement a method, comprising:
- receiving a video including a sequence of images;
  
  detecting at least one object of interest in one or more of the images;
  
  locating feature reference points of the at least one object of interest;
  
  aligning a virtual face mesh to the at least one object of interest in one or more of the images based at least in part on the feature reference points;
  
  finding over the sequence of images at least one deformation of the virtual face mesh, wherein the at least one deformation is associated with at least one face mimic;
  
  determining that the at least one deformation refers to a facial emotion selected from a plurality of reference facial emotions;
  
  generating an emotional status of an individual based on the facial emotion selected from the plurality of reference facial emotions;
  
  determining that the facial emotion is a negative facial emotion; and
  
  generating a communication bearing data associated with the negative facial emotion only where the facial emotion is determined to be a negative facial emotion.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Snap, Inc.
Original Assignee
Snap, Inc.
Inventors
Shaburov, Victor, Monastyrshyn, Yurii
Primary Examiner(s)
Akhavannik, Hadi

Application Number

US14/661,539
Publication Number

US 20150286858A1
Time in Patent Office

706 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06Q 30/0281   Customer communication at a...

G06T 2207/10016   Video; Image sequence

G06T 2207/30201   Face

G06T 7/337   involving reference images ...

G06T 7/344   involving models

G06V 10/7553   based on shape, e.g. active...

G06V 20/64   Three-dimensional objects

G06V 40/165   using facial parts and geom...

G06V 40/167   using comparisons between t...

G06V 40/171   Local features and componen...

G06V 40/176   Dynamic expression

G10L 25/57   for processing of video sig...

G10L 25/63   for estimating an emotional...

H04N 7/147   Communication arrangements,...

H04N 7/15   Conference systems

Emotion recognition in video conferencing

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Emotion recognition in video conferencing

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links