COMPUTING SYSTEM FOR EXPRESSIVE THREE-DIMENSIONAL FACIAL ANIMATION

US 20200135226A1
Filed: 10/29/2018
Published: 04/30/2020
Est. Priority Date: 10/29/2018
Status: Active Grant

First Claim

Patent Images

1. A computing device comprising:

a processor;

memory storing instructions, wherein the instructions, when executed by the processor, cause the processor to perform acts comprising;

receiving an audio sequence comprising content features reflective of spoken words uttered by a speaker;

generating latent content variables based upon the content features of the audio sequence, wherein the latent content variables are to be used to synchronize movement of lips on a visual representation of a face to the spoken words uttered by the speaker;

generating latent style variables based upon the audio sequence, wherein the latent style variables are derived from an expected appearance of facial features of the speaker as the speaker utters the spoken words, wherein the latent style variables are to be used to synchronize movement of full facial features of the visual representation of the face to the spoken words uttered by the speaker; and

causing the visual representation of the face to be animated on a display based upon the latent content variables and the latent style variables.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computer-implemented technique for animating a visual representation of a face based on spoken words of a speaker is described herein. A computing device receives an audio sequence comprising content features reflective of spoken words uttered by a speaker. The computing device generates latent content variables and latent style variables based upon the audio sequence. The latent content variables are used to synchronized movement of lips on the visual representation to the spoken words uttered by the speaker. The latent style variables are derived from an expected appearance of facial features of the speaker as the speaker utters the spoken words and are used to synchronize movement of full facial features of the visual representation to the spoken words uttered by the speaker. The computing device causes the visual representation of the face to be animated on a display based upon the latent content variables and the latent style variables.

Citations

20 Claims

1. A computing device comprising:
- a processor;
  
  memory storing instructions, wherein the instructions, when executed by the processor, cause the processor to perform acts comprising;
  
  receiving an audio sequence comprising content features reflective of spoken words uttered by a speaker;
  
  generating latent content variables based upon the content features of the audio sequence, wherein the latent content variables are to be used to synchronize movement of lips on a visual representation of a face to the spoken words uttered by the speaker;
  
  generating latent style variables based upon the audio sequence, wherein the latent style variables are derived from an expected appearance of facial features of the speaker as the speaker utters the spoken words, wherein the latent style variables are to be used to synchronize movement of full facial features of the visual representation of the face to the spoken words uttered by the speaker; and
  
  causing the visual representation of the face to be animated on a display based upon the latent content variables and the latent style variables.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
- - 2. The computing device of claim 1, wherein the computing device is a gaming console.
  - 3. The computing device of claim 1, wherein the computing device receives the audio sequence by way of a microphone.
  - 4. The computing device of claim 1, wherein the latent style variables are influenced by at least one of an age of the speaker, a gender of the speaker, an ethnicity of the speaker, or one or more emotions of the speaker as the speaker utters the spoken words.
  - 5. The computing device of claim 1, wherein the computing device is in network communication with a second computing device, wherein the second computing device comprises a microphone and the display, wherein the second computing devices receives the audio sequence from the speaker by way of the microphone, wherein the second computing device transmits the audio sequence to the computing device, wherein the computing device transmits data to the second computing device causing the second computing device to animate the visual representation of the face on the display.
  - 6. The computing device of claim 1, the acts further comprising:
    - subsequent to generating the latent style variables and prior to causing the visual representation of the face to be animated, segmenting the visual representation of the face into segments, wherein each segment in the segments is assigned to a different facial feature of the visual representation of the face, wherein causing the visual representation of the face to be animated on the display is further based upon the segments.
  - 7. The computing device of claim 6, wherein the segments comprise:
    - a first segment assigned to eyes of the visual representation;
      
      a second segment assigned to a nose of the visual representation;
      
      a third segment assigned to a mouth of the visual representation; and
      
      a fourth segment assigned to features of the visual representation other than the eyes, the nose, and the mouth.
  - 8. The computing device of claim 1, wherein the latent content variables and the latent style variables are generated by a recurrent neural network (RNN) comprising a plurality of long short-term memory (LSTM) units.
  - 9. The computing device of claim 1, wherein the visual representation of the face is a depiction of a face of the speaker.
  - 10. The computing device of claim 1, wherein causing the visual representation of the face to be animated on the display based upon the latent content variables and the latent style variables comprises:
    - generating a plurality of frames, wherein each frame depicts a state of the visual representation of the face at a sequential point in time; and
      
      causing each frame in the plurality of frames to be presented sequentially on the display.
  - 11. The computing device of claim 1, the acts further comprising:
    - causing the audio sequence to be played on an audio speaker concurrently with causing the visual representation of the face to be animated such that movements of the visual representation of the face are synchronized with the spoken words of the audio sequence.

12. A method executed by a processor of a computing device, the method comprising:
- receiving an audio sequence comprising content features reflective of spoken words uttered by a speaker;
  
  generating latent content variables based upon the content features of the audio sequence, wherein the latent content variables are to be used to synchronize movement of lips on a visual representation of a face to the spoken words uttered by the speaker;
  
  generating latent style variables based upon the audio sequence, wherein the latent style variables are derived from an expected appearance of facial features of the speaker as the speaker utters the spoken words, wherein the latent style variables are to be used to synchronize movement of full facial features of the visual representation of the face to the spoken words uttered by the speaker;
  
  segmenting the visual representation of the face into segments, wherein each segment in the segments is assigned to a different facial feature of the visual representation of the face; and
  
  causing the visual representation of the face to be animated on a display based upon the latent content variables, the latent style variables, and the segments.
- View Dependent Claims (13, 14, 15, 16)
- - 13. The method of claim 12, wherein the visual representation of the face is an avatar of the speaker.
  - 14. The method of claim 12, wherein the latent style variables are influenced by at least one of an age of the speaker, a gender of the speaker, an ethnicity of the speaker, or one or more emotions of the speaker as the speaker utters the spoken words.
  - 15. The method of claim 12, wherein the segments comprise:
    - a first segment assigned to eyes of the visual representation;
      
      a second segment assigned to a nose of the visual representation;
      
      a third segment assigned to a mouth of the visual representation; and
      
      a fourth segment assigned to features of the visual representation other than the eyes, the nose, and the mouth.
  - 16. The method of claim 12 further comprising:
    - causing the audio sequence to be played on a speaker concurrently with causing the visual representation of the face to be animated such that movements of the visual representation of the face are synchronized with the spoken words of the audio sequence.

17. A computer-readable storage medium comprising instructions that, when executed one or more processors of a computing device, perform acts comprising:
- receiving an audio sequence comprising content features reflective of spoken words uttered by a speaker by way of a microphone;
  
  generating latent content variables based upon the content features of the audio sequence, wherein the latent content variables are to be used to synchronize movement of lips on a visual representation of a face to the spoken words uttered by the speaker;
  
  generating latent style variables based upon the audio sequence, wherein the latent style variables are derived from an expected appearance of facial features of the speaker as the speaker utters the spoken words, wherein the latent style variables are to be used to synchronize movement of full facial features of the visual representation of the face to the spoken words uttered by the speaker;
  
  causing the visual representation of the face to be animated on a display of a second computing device that is in network communication with the computing device based upon the latent content variables and the latent style variables; and
  
  causing the audio sequence to be played on a speaker of the second computing device concurrently with causing the visual representation of the face to be animated such that movements of the visual representation of the face are synchronized with the spoken words of the audio sequence.
- View Dependent Claims (18, 19, 20)
- - 18. The computer-readable storage medium of claim 17, wherein the microphone is comprised by a third computing device operated by the speaker, wherein the third computing device is in network communication with the computing device, wherein the audio sequence is received from the third computing device.
  - 19. The computer-readable storage medium of claim 17, wherein the latent style variables are influenced by at least one of an age of the speaker, a gender of the speaker, an ethnicity of the speaker, or one or more emotions of the speaker as the speaker utters the spoken words.
  - 20. The computer-readable storage medium of claim 17, wherein the one or more processors are graphics processing units (GPUs).

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
MITTAL, Gaurav, WANG, Baoyuan

Granted Patent

US 11,238,885 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06N 3/04   Architecture, e.g. intercon...

G06T 13/40   of characters, e.g. humans,...

G06T 17/20   Finite element generation, ...

G06T 7/11   Region-based segmentation

G06T 7/246   using feature-based methods...

G10L 2021/105   Synthesis of the lips movem...

G10L 21/10   Transforming into visible i...

H04L 51/10   Multimedia information

COMPUTING SYSTEM FOR EXPRESSIVE THREE-DIMENSIONAL FACIAL ANIMATION

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

COMPUTING SYSTEM FOR EXPRESSIVE THREE-DIMENSIONAL FACIAL ANIMATION

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links