Method and system for aligning natural and synthetic video to speech synthesis

US 6,567,779 B1
Filed: 08/05/1997
Issued: 05/20/2003
Est. Priority Date: 08/05/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method for encoding a facial animation comprising the steps of:

a) creating a data stream;

b) creating a facial mimic stream including a plurality of facial animation parameters;

c) inserting a plurality of time stamps in the data stream pointing to said plurality of facial animation parameters, wherein said plurality of time stamps establishes a synchronization relationship with said data stream and said facial mimic stream; and

d) encoding said data stream and said facial mimic stream.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

According to MPEG-4'"'"'s TTS architecture, facial animation can be driven by two streams simultaneously—text, and Facial Animation Parameters. In this architecture, text input is sent to a Text-To-Speech converter at a decoder that drives the mouth shapes of the face. Facial Animation Parameters are sent from an encoder to the face over the communication channel. The present invention includes codes (known as bookmarks) in the text string transmitted to the Text-to-Speech converter, which bookmarks are placed between words as well as inside them. According to the present invention, the bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. In addition, the Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system of the present invention reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system. Finally, the facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.

42 Citations

View as Search Results

12 Claims

1. A method for encoding a facial animation comprising the steps of:
- a) creating a data stream;
  
  b) creating a facial mimic stream including a plurality of facial animation parameters;
  
  c) inserting a plurality of time stamps in the data stream pointing to said plurality of facial animation parameters, wherein said plurality of time stamps establishes a synchronization relationship with said data stream and said facial mimic stream; and
  
  d) encoding said data stream and said facial mimic stream.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method according to claim 1, further comprising inserting said plurality of time stamps in the facial mimic stream.
  - 3. The method according to claim 1, wherein the data stream comprises a text stream, and wherein a decoding process converts the text stream to speech.
  - 4. The method according to claim 3, further comprising the step of placing at least one of the plurality of time stamps in between words in the text stream.
  - 5. The method according to claim 3, further comprising the step of placing at least one of the plurality of time stamps in between syllables in the text stream.
  - 6. The method according to claim 3, further comprising the step of placing at least one of the plurality of time stamps inside words in the text stream.
  - 7. The method according to claim 1, wherein the data stream comprises a video stream.
  - 8. The method according to claim 1, wherein the data ream comprises an audio stream.

9. A method for encoding a facial animation including at least one facial mimic and speech in the form of a text stream, comprising the steps of:
- a) assigning a predetermined code to the at least one facial mimic;
  
  b) placing the predetermined code within the text stream, wherein said predetermined code indicates a presence of a particular facial mimic and wherein said predetermined code points to a stream of facial mimics, thereby indicating a synchronization relationship between the text stream and the facial mimic stream;
  
  c) encoding said text stream; and
  
  d) placing the predetermined code in between letters in the text stream.

10. A method for encoding a facial animation including at least one facial mimic and speech in the form of a text stream, comprising the steps of:
- a) assigning a predetermined code to the at least one facial mimic;
  
  b) placing the predetermined code within the text stream, wherein said predetermined code indicates a presence of a particular facial mimic and wherein said predetermined code points to a stream of facial mimics, thereby indicating a synchronization relationship between the text stream and the facial mimic stream;
  
  c) encoding said text stream; and
  
  d) placing the predetermined code inside words in the text stream.

11. A method for decoding a facial animation including speech and at least one facial mimic, comprising the steps of:
- a) monitoring a text stream for a set of predetermined codes corresponding to a set of facial mimics, wherein the predetermined code points to a stream of facial mimics established during an encoding process of said text stream, thereby indicating a synchronization relationship between the text stream and the facial mimic stream;
  
  b) sending a signal to a visual decoder to start a particular facial mimic upon detecting the presence of the set of predetermined codes; and
  
  c) placing the predetermined code in between phonemes in the text stream.

12. A method for decoding a facial animation including speech and at least one facial mimic, comprising the steps of:
- a) monitoring a text stream for a set of predetermined codes corresponding to a set of facial mimics, wherein the predetermined code points to a stream of facial mimics established during an encoding process of said text stream, thereby indicating a synchronization relationship between the text stream and the facial mimic stream;
  
  b) sending a signal to a visual decoder to start a particular facial mimic upon detecting the presence of the set of predetermined codes; and
  
  c) placing the predetermined code inside words in the text stream.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Basso, Andrea, Beutnagel, Mark Charles, Ostermann, Joern
Primary Examiner(s)
Banks-Harold, Marsha D.
Assistant Examiner(s)
Lerner, Martin

Application Number

US08/905,931
Time in Patent Office

2,114 Days
Field of Search

704/260, 704/270, 704/258, 704/271, 704/276, 704/277, 704/235, 345/302, 345/473, 345/349, 345/358, 345/706, 386/95, 386/96, 386/104
US Class Current

704/258
CPC Class Codes

G10L 13/00   Speech synthesis; Text to s...

G10L 15/24   Speech recognition using no...

G10L 2021/105   Synthesis of the lips movem...

H04N 19/20   using video object coding

H04N 19/46   Embedding additional inform...

H04N 19/61   in combination with predict...

Method and system for aligning natural and synthetic video to speech synthesis

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

42 Citations

12 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for aligning natural and synthetic video to speech synthesis

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

42 Citations

12 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links