METHOD AND SYSTEM FOR ALIGNING NATURAL AND SYNTHETIC VIDEO TO SPEECH SYNTHESIS

US 20080059194A1
Filed: 10/31/2007
Published: 03/06/2008
Est. Priority Date: 08/05/1997
Status: Active Grant

First Claim

Patent Images

1. A method of aligning video with audio, the method comprising:

identifying a predetermined code associated with an animation mimic in a first stream; and

transmitting the predetermined code within the second stream to thereby synchronize the second stream with the first stream.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

According to MPEG-4'"'"'s TTS architecture, facial animation can be driven by two streams simultaneously—text, and Facial Animation Parameters. In this architecture, text input is sent to a Text-To-Speech converter at a decoder that drives the mouth shapes of the face. Facial Animation Parameters are sent from an encoder to the face over the communication channel. The present invention includes codes (known as bookmarks) in the text string transmitted to the Text-to-Speech converter, which bookmarks are placed between words as well as inside them. According to the present invention, the bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. In addition, the Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system of the present invention reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system. Finally, the facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.

Citations

18 Claims

1. A method of aligning video with audio, the method comprising:
- identifying a predetermined code associated with an animation mimic in a first stream; and
  
  transmitting the predetermined code within the second stream to thereby synchronize the second stream with the first stream.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the first stream is an animation mimics stream and the second stream is a text stream.
  - 3. The method of claim 1, wherein the animation mimic is a facial mimic.
  - 4. The method of claim 1, wherein the predetermined code comprises an escape sequence followed by a plurality of bits, which define one of a set of possible animation mimics.
  - 5. The method of claim 2, further comprising encoding the first stream containing the animation mimic and the text stream containing the predetermined code.
  - 6. The method of claim 1, further comprising placing the predetermined code in between words in the second stream.

7. A method of decoding data, the method comprising:
- monitoring a first stream of data for a predetermined code that is associated with an animation mimic transmittal within a second stream; and
  
  transmitting a signal to a decoder to start an animation using at least the animation mimic from the second stream and associated with the predetermined code.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The method of claim 7, wherein the predetermined code in the first stream that is associated with the animation mimic in the second stream indicates a synchronization relationship between the first stream and the second stream.
  - 9. The method of claim 7, wherein the first stream is a text stream and the second stream is an animation mimic stream.
  - 10. The method of claim 7, wherein the animation mimic is a facial mimic.
  - 11. The method of claim 7, wherein the predetermined code is placed according to one of:
    - between words in the first stream, inside words in the first stream, or between phonemes within the first stream.
  - 12. The method of claim 7, wherein the predetermined code points to the animation mimic.

13. A decoder comprising:
- a module that decodes a predetermined code from a first stream that is associated with an animation mimic within a second stream; and
  
  a module that starts an animation using the animation mimic based on the decoded predetermined code.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The decoder of claim 13, wherein the predetermined code in the first stream being associated with the animation mimic in the second stream indicates a synchronization relationship between the first stream and the second stream.
  - 15. The decoder of claim 13, wherein the first stream is a text stream and the second stream is an animation mimic stream.
  - 16. The decoder of claim 13, wherein a correspondence between the predetermined code and the animation mimic is established during an encoding process of the first stream.
  - 17. The decoder of claim 13, wherein the animation mimic is a facial mimic.
  - 18. The decoder of claim 13, wherein the predetermined code points to the animation mimic.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Basso, Andrea, Ostermann, Joern, Beutnagel, Mark

Granted Patent

US 7,584,105 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/260
CPC Class Codes

G06T 9/001   Model-based coding, e.g. wi...

G10L 13/00   Speech synthesis; Text to s...

G10L 21/06   Transformation of speech in...

H04N 21/2368   Multiplexing of audio and v...

H04N 21/4341   Demultiplexing of audio and...

METHOD AND SYSTEM FOR ALIGNING NATURAL AND SYNTHETIC VIDEO TO SPEECH SYNTHESIS

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

METHOD AND SYSTEM FOR ALIGNING NATURAL AND SYNTHETIC VIDEO TO SPEECH SYNTHESIS

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links