Method and system for aligning natural and synthetic video to speech synthesis
First Claim
1. A method of aligning video with audio, the method comprising:
- identifying a predetermined code associated with an animation mimic in a first stream, wherein the predetermined code comprises an escape sequence followed by a plurality of bits, which define one of a set of possible animation mimics; and
transmitting the predetermined code within a second stream to thereby synchronize the second stream with the first stream.
4 Assignments
0 Petitions
Accused Products
Abstract
According to MPEG-4'"'"'s TTS architecture, facial animation can be driven by two streams simultaneously—text and Facial Animation Parameters. A Text-To-Speech converter drives the mouth shapes of the face. An encoder sends Facial Animation Parameters to the face. The text input can include codes, or bookmarks, transmitted to the Text-to-Speech converter, which are placed between and inside words. The bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. The Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system reads the bookmark and provides the encoder time stamp and a real-time time stamp. The facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.
-
Citations
15 Claims
-
1. A method of aligning video with audio, the method comprising:
-
identifying a predetermined code associated with an animation mimic in a first stream, wherein the predetermined code comprises an escape sequence followed by a plurality of bits, which define one of a set of possible animation mimics; and transmitting the predetermined code within a second stream to thereby synchronize the second stream with the first stream. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for aligning video with audio, the system comprising:
-
a processor; a module configured to control the processor to identify a predetermined code associated with an animation mimic in a first stream, wherein the predetermined code comprises an escape sequence followed by a plurality of bits, which define one of a set of possible animation mimics; and a module configured to control the processor to transmit the predetermined code within a second stream to thereby synchronize the second stream with the first stream. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer-readable medium storing instructions for controlling a computing device to align a video with audio, the instructions comprising:
-
identifying a predetermined code associated with an animation mimic in a first stream, wherein the predetermined code comprises an escape sequence followed by a plurality of bits, which define one of a set of possible animation mimics; and transmitting the predetermined code within a second stream to thereby synchronize the second stream with the first stream. - View Dependent Claims (12, 13, 14, 15)
-
Specification