PHOTO-REALISTIC SYNTHESIS OF IMAGE SEQUENCES WITH LIP MOVEMENTS SYNCHRONIZED WITH SPEECH
First Claim
1. A computer-implemented method for generating photo-realistic facial animation with speech, comprising:
- generating in a computer storage medium a statistical model of audiovisual data over time, based on acoustic feature vectors and visual feature vectors from audiovisual data of an individual'"'"'s articulators during speech;
generating using a computer processor a visual feature vector sequence using the statistical model corresponding to an input set of acoustic feature vectors for speech with which the facial animation is to be synchronized;
creating using a computer processor an image sample sequence from an image library using the generated visual feature vector sequence; and
processing the image sample sequence to provide the photo-realistic facial animation synchronized with the speech.
2 Assignments
0 Petitions
Accused Products
Abstract
Audiovisual data of an individual reading a known script is obtained and stored in an audio library and an image library. The audiovisual data is processed to extract feature vectors used to train a statistical model. An input audio feature vector corresponding to desired speech with which a synthesized image sequence will be synchronized is provided. The statistical model is used to generate a trajectory of visual feature vectors that corresponds to the input audio feature vector. These visual feature vectors are used to identify a matching image sequence from the image library. The resulting sequence of images, concatenated from the image library, provides a photorealistic image sequence with lip movements synchronized with the desired speech.
12 Citations
20 Claims
-
1. A computer-implemented method for generating photo-realistic facial animation with speech, comprising:
-
generating in a computer storage medium a statistical model of audiovisual data over time, based on acoustic feature vectors and visual feature vectors from audiovisual data of an individual'"'"'s articulators during speech; generating using a computer processor a visual feature vector sequence using the statistical model corresponding to an input set of acoustic feature vectors for speech with which the facial animation is to be synchronized; creating using a computer processor an image sample sequence from an image library using the generated visual feature vector sequence; and processing the image sample sequence to provide the photo-realistic facial animation synchronized with the speech. - View Dependent Claims (2, 3, 4, 5, 6, 7, 9, 10)
-
-
8. A computer system for generating photo-realistic facial animation with speech, comprising:
-
a computer storage medium storing a statistical model of audiovisual data over time, based on acoustic feature vectors and visual feature vectors from audiovisual data of an individual'"'"'s articulators during a set of utterances; a synthesis module having an input for receiving an input set of feature vectors for speech with which the facial animation is to be synchronized, and providing as an output a visual feature vector sequence corresponding to the input set of feature vectors according to the statistical model; an image selection module having an input for receiving the visual feature vector sequence and an output providing an image sample sequence from an image library corresponding to the visual feature vector sequence. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A computer program product comprising:
-
a computer storage medium; computer program instructions stored on the computer storage medium that, when processed by a computing device, instruct the computing device to perform a method for generating photo-realistic facial animation with speech, comprising; generating in a computer storage medium a statistical model of audiovisual data over time, based on acoustic feature vectors and visual feature vectors from audiovisual data of an individual'"'"'s articulators during speech; generating using a computer processor a visual feature vector sequence using the statistical model corresponding to an input set of acoustic feature vectors for speech with which the facial animation is to be synchronized; creating using a computer processor an image sample sequence from an image library using the generated visual feature vector sequence; and processing the image sample sequence to provide the photo-realistic facial animation synchronized with the speech. - View Dependent Claims (17, 18, 19, 20)
-
Specification