Method and system for aligning natural and synthetic video to speech synthesis
First Claim
1. A computer-readable medium storing instructions for controlling a computing device to encode an animation comprising at least one animation mimic within a first stream and speech associated with a second stream, the instructions comprising:
- assigning a predetermined code that points to an animation mimic within a first stream; and
synchronizing a second stream with the animation mimics stream by placing the predetermined code within the second stream.
4 Assignments
0 Petitions
Accused Products
Abstract
Facial animation in MPEG-4 can be driven by a text stream and a Facial Animation Parameters (FAP) stream. Text input is sent to a TTS converter that drives the mouth shapes of the face. FAPs are sent from an encoder to the face over the communication channel. Disclosed are codes bookmarks in the text string transmitted to the TTS converter. Bookmarks are placed between and inside words and carry an encoder time stamp. The encoder time stamp does not relate to real-world time. The FAP stream carries the same encoder time stamp found in the bookmark of the text. The system reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system. The facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.
-
Citations
17 Claims
-
1. A computer-readable medium storing instructions for controlling a computing device to encode an animation comprising at least one animation mimic within a first stream and speech associated with a second stream, the instructions comprising:
-
assigning a predetermined code that points to an animation mimic within a first stream; and synchronizing a second stream with the animation mimics stream by placing the predetermined code within the second stream. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-readable medium storing instructions for controlling a computing device to decode an animation including speech and at least one animation mimic, the instructions comprising:
-
monitoring a first stream for a predetermined code that points to an animation mimic within a second stream thereby indicating a synchronization relationship between the first stream and the second stream; and sending a signal to a visual decoder to start the animation mimic that is pointed to by the predetermined code. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A system for decoding at least one stream of data, the system comprising:
-
a module that monitors a first stream for a predetermined code that points to an animation mimic within a second stream thereby indicating a synchronization relationship between the first stream and the second stream; and a module that sends a signal to a visual decoder to start the animation mimic that is pointed to by the predetermined code. - View Dependent Claims (14, 15, 16, 17)
-
Specification