Coarticulation method for audio-visual text-to-speech synthesis
First Claim
1. A method for generating a noise-producing entity, comprising:
- reading first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus;
reading, based on the first data, corresponding second data comprising images of a noise-producing entity; and
generating, using the second data, an animated sequence of the noise-producing entity tracking the input stimulus.
4 Assignments
0 Petitions
Accused Products
Abstract
A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. Representative parameters are extracted from the image samples and stored in an animation library. The processor also samples a plurality of multiphones comprising images together with their associated sounds. The processor extracts parameters from these images comprising data characterizing mouth shapes, maps, rules, or equations, and stores the resulting parameters and sound information in a coarticulation library. The animated sequence begins with the processor considering an input phoneme sequence, recalling from the coarticulation library parameters associated with that sequence, and selecting appropriate image samples from the animation library based on that sequence. The image samples are concatenated together, and the corresponding sound is output, to form the animated synthesis.
-
Citations
22 Claims
-
1. A method for generating a noise-producing entity, comprising:
-
reading first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus; reading, based on the first data, corresponding second data comprising images of a noise-producing entity; and generating, using the second data, an animated sequence of the noise-producing entity tracking the input stimulus. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A noise-producing animated entity generated by a method comprising:
-
reading first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus; reading, based on the first data, corresponding second data comprising images of a noise-producing entity; and generating, using the second data, an animated sequence of the noise-producing entity tracking the input stimulus. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
Specification