COMPUTING SYSTEM FOR EXPRESSIVE THREE-DIMENSIONAL FACIAL ANIMATION
First Claim
1. A computing device comprising:
- a processor;
memory storing instructions, wherein the instructions, when executed by the processor, cause the processor to perform acts comprising;
receiving an audio sequence comprising content features reflective of spoken words uttered by a speaker;
generating latent content variables based upon the content features of the audio sequence, wherein the latent content variables are to be used to synchronize movement of lips on a visual representation of a face to the spoken words uttered by the speaker;
generating latent style variables based upon the audio sequence, wherein the latent style variables are derived from an expected appearance of facial features of the speaker as the speaker utters the spoken words, wherein the latent style variables are to be used to synchronize movement of full facial features of the visual representation of the face to the spoken words uttered by the speaker; and
causing the visual representation of the face to be animated on a display based upon the latent content variables and the latent style variables.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-implemented technique for animating a visual representation of a face based on spoken words of a speaker is described herein. A computing device receives an audio sequence comprising content features reflective of spoken words uttered by a speaker. The computing device generates latent content variables and latent style variables based upon the audio sequence. The latent content variables are used to synchronized movement of lips on the visual representation to the spoken words uttered by the speaker. The latent style variables are derived from an expected appearance of facial features of the speaker as the speaker utters the spoken words and are used to synchronize movement of full facial features of the visual representation to the spoken words uttered by the speaker. The computing device causes the visual representation of the face to be animated on a display based upon the latent content variables and the latent style variables.
-
Citations
20 Claims
-
1. A computing device comprising:
-
a processor; memory storing instructions, wherein the instructions, when executed by the processor, cause the processor to perform acts comprising; receiving an audio sequence comprising content features reflective of spoken words uttered by a speaker; generating latent content variables based upon the content features of the audio sequence, wherein the latent content variables are to be used to synchronize movement of lips on a visual representation of a face to the spoken words uttered by the speaker; generating latent style variables based upon the audio sequence, wherein the latent style variables are derived from an expected appearance of facial features of the speaker as the speaker utters the spoken words, wherein the latent style variables are to be used to synchronize movement of full facial features of the visual representation of the face to the spoken words uttered by the speaker; and causing the visual representation of the face to be animated on a display based upon the latent content variables and the latent style variables. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method executed by a processor of a computing device, the method comprising:
-
receiving an audio sequence comprising content features reflective of spoken words uttered by a speaker; generating latent content variables based upon the content features of the audio sequence, wherein the latent content variables are to be used to synchronize movement of lips on a visual representation of a face to the spoken words uttered by the speaker; generating latent style variables based upon the audio sequence, wherein the latent style variables are derived from an expected appearance of facial features of the speaker as the speaker utters the spoken words, wherein the latent style variables are to be used to synchronize movement of full facial features of the visual representation of the face to the spoken words uttered by the speaker; segmenting the visual representation of the face into segments, wherein each segment in the segments is assigned to a different facial feature of the visual representation of the face; and causing the visual representation of the face to be animated on a display based upon the latent content variables, the latent style variables, and the segments. - View Dependent Claims (13, 14, 15, 16)
-
-
17. A computer-readable storage medium comprising instructions that, when executed one or more processors of a computing device, perform acts comprising:
-
receiving an audio sequence comprising content features reflective of spoken words uttered by a speaker by way of a microphone; generating latent content variables based upon the content features of the audio sequence, wherein the latent content variables are to be used to synchronize movement of lips on a visual representation of a face to the spoken words uttered by the speaker; generating latent style variables based upon the audio sequence, wherein the latent style variables are derived from an expected appearance of facial features of the speaker as the speaker utters the spoken words, wherein the latent style variables are to be used to synchronize movement of full facial features of the visual representation of the face to the spoken words uttered by the speaker; causing the visual representation of the face to be animated on a display of a second computing device that is in network communication with the computing device based upon the latent content variables and the latent style variables; and causing the audio sequence to be played on a speaker of the second computing device concurrently with causing the visual representation of the face to be animated such that movements of the visual representation of the face are synchronized with the spoken words of the audio sequence. - View Dependent Claims (18, 19, 20)
-
Specification