Method and system for aligning natural and synthetic video to speech synthesis
First Claim
1. A method for encoding an animation comprising at least one animation mimic within an animation mimics stream and speech associated with a text stream, the method comprising:
- assigning a predetermined code that points to an animation mimic within an animation mimics stream; and
synchronizing a text stream with the animation mimics stream by placing the predetermined code within the text stream.
4 Assignments
0 Petitions
Accused Products
Abstract
According to MPEG-4'"'"'s TTS architecture, facial animation can be driven by two streams simultaneously—text, and Facial Animation Parameters. In this architecture, text input is sent to a Text-To-Speech converter at a decoder that drives the mouth shapes of the face. Facial Animation Parameters are sent from an encoder to the face over the communication channel. The present invention includes codes (known as bookmarks) in the text string transmitted to the Text-to-Speech converter, which bookmarks are placed between words as well as inside them. According to the present invention, the bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. In addition, the Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system of the present invention reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system. Finally, the facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.
-
Citations
20 Claims
-
1. A method for encoding an animation comprising at least one animation mimic within an animation mimics stream and speech associated with a text stream, the method comprising:
-
assigning a predetermined code that points to an animation mimic within an animation mimics stream; and
synchronizing a text stream with the animation mimics stream by placing the predetermined code within the text stream. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method for decoding an animation including speech and at least one animation mimic, the method comprising:
-
monitoring a text stream for a predetermined code that points to an animation mimic within an animation mimic stream thereby indicating a synchronization relationship between the text stream and the animation mimic stream; and
sending a signal to a visual decoder to start the animation mimic that is pointed to by the predetermined code. - View Dependent Claims (8, 9, 10)
-
-
11. A system for decoding an encoded animation, the system comprising:
-
means for monitoring a text stream for a predetermined code that points to an animation mimic within an animation mimic stream thereby indicating a synchronization relationship between the text stream and the animation mimic stream; and
means for sending a signal to a visual decoder to start the animation mimic that is pointed to by the predetermined code. - View Dependent Claims (12, 13, 14, 15, 16)
-
-
17. A system for decoding an encoded animation, the system comprising:
-
a module that monitors a text stream for a predetermined code that points to an animation mimic within an animation mimic stream thereby indicating a synchronization relationship between the text stream and the animation mimic stream; and
a module that sends a signal to a visual decoder to start the animation mimic that is pointed to by the predetermined code. - View Dependent Claims (18, 19, 20)
-
Specification