Method and system for aligning natural and synthetic video to speech synthesis

US 20050119877A1
Filed: 01/07/2005
Published: 06/02/2005
Est. Priority Date: 08/05/1997
Status: Active Grant

First Claim

Patent Images

1. A method for encoding an animation comprising at least one animation mimic within an animation mimics stream and speech associated with a text stream, the method comprising:

assigning a predetermined code that points to an animation mimic within an animation mimics stream; and

synchronizing a text stream with the animation mimics stream by placing the predetermined code within the text stream.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

According to MPEG-4'"'"'s TTS architecture, facial animation can be driven by two streams simultaneously—text, and Facial Animation Parameters. In this architecture, text input is sent to a Text-To-Speech converter at a decoder that drives the mouth shapes of the face. Facial Animation Parameters are sent from an encoder to the face over the communication channel. The present invention includes codes (known as bookmarks) in the text string transmitted to the Text-to-Speech converter, which bookmarks are placed between words as well as inside them. According to the present invention, the bookmarks carry an encoder time stamp. Due to the nature of text-to-speech conversion, the encoder time stamp does not relate to real-world time, and should be interpreted as a counter. In addition, the Facial Animation Parameter stream carries the same encoder time stamp found in the bookmark of the text. The system of the present invention reads the bookmark and provides the encoder time stamp as well as a real-time time stamp to the facial animation system. Finally, the facial animation system associates the correct facial animation parameter with the real-time time stamp using the encoder time stamp of the bookmark as a reference.

Citations

20 Claims

1. A method for encoding an animation comprising at least one animation mimic within an animation mimics stream and speech associated with a text stream, the method comprising:
- assigning a predetermined code that points to an animation mimic within an animation mimics stream; and
  
  synchronizing a text stream with the animation mimics stream by placing the predetermined code within the text stream.
- View Dependent Claims (2, 3, 4, 5, 6)
- - 2. The method of claim 1, wherein the animation mimic is a facial mimic.
  - 3. The method according to claim 1, wherein the predetermined code comprises an escape sequence followed by a plurality of bits, which define one of a set of possible animation mimics.
  - 4. The method according to claim 1, further comprising encoding the animation mimic stream and the text stream containing the predetermined code.
  - 5. The method of claim 1, further comprising placing the predetermined code in between words in the text stream.
  - 6. The method according to claim 1, further comprising placing the predetermined code in between letters in the text stream.

7. A method for decoding an animation including speech and at least one animation mimic, the method comprising:
- monitoring a text stream for a predetermined code that points to an animation mimic within an animation mimic stream thereby indicating a synchronization relationship between the text stream and the animation mimic stream; and
  
  sending a signal to a visual decoder to start the animation mimic that is pointed to by the predetermined code.
- View Dependent Claims (8, 9, 10)
- - 8. The method of claim 7, wherein the correspondence between the predetermined code and the animation mimic is established during an encoding process of the text stream.
  - 9. The method of claim 7, wherein the animation mimic is a facial mimic.
  - 10. The method of claim 7, wherein the animation stream and the text stream are distinctly maintained streams.

11. A system for decoding an encoded animation, the system comprising:
- means for monitoring a text stream for a predetermined code that points to an animation mimic within an animation mimic stream thereby indicating a synchronization relationship between the text stream and the animation mimic stream; and
  
  means for sending a signal to a visual decoder to start the animation mimic that is pointed to by the predetermined code.
- View Dependent Claims (12, 13, 14, 15, 16)
- - 12. The system of claim 11, wherein the correspondence between the predetermined code and the animation mimic is established during an encoding process of the text stream.
  - 13. The system of claim 11, wherein the animation mimic is a facial mimic.
  - 14. The system of claim 11, wherein the animation stream and the text stream are distinctly maintained streams.
  - 15. The system of claim 11, wherein the predetermined code comprises at least an escape sequence.
  - 16. The system of claim 11, wherein the predetermined code is placed according to one of:
    - in between phonemes in the text stream, in between words in the text stream, or inside words in the text stream.

17. A system for decoding an encoded animation, the system comprising:
- a module that monitors a text stream for a predetermined code that points to an animation mimic within an animation mimic stream thereby indicating a synchronization relationship between the text stream and the animation mimic stream; and
  
  a module that sends a signal to a visual decoder to start the animation mimic that is pointed to by the predetermined code.
- View Dependent Claims (18, 19, 20)
- - 18. The system of claim 17, wherein the animation mimic is a facial mimic.
  - 19. The system of claim 17, wherein the animation stream and the text stream are distinctly maintained streams.
  - 20. The system of claim 17, wherein the predetermined code is placed according to one of:
    - between words in the text stream, inside words in the text stream, or between phonemes within the text stream.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nuance Communications, Inc. (Microsoft Corporation)
Original Assignee
AT&T Corporation (AT&T, Inc.)
Inventors
Basso, Andrea, Ostermann, Joern, Beutnagel, Mark Charles

Granted Patent

US 7,110,950 B2
Time in Patent Office

Days
Field of Search
US Class Current

704/200.100
CPC Class Codes

G10L 13/00   Speech synthesis; Text to s...

G10L 15/24   Speech recognition using no...

G10L 2021/105   Synthesis of the lips movem...

H04N 19/20   using video object coding

H04N 19/46   Embedding additional inform...

H04N 19/61   in combination with predict...

Method and system for aligning natural and synthetic video to speech synthesis

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for aligning natural and synthetic video to speech synthesis

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links