Film language

US 20020097380A1
Filed: 12/20/2001
Published: 07/25/2002
Est. Priority Date: 12/22/2000
Status: Active Grant

First Claim

Patent Images

1. A method for modifying an audio visual recording originally produced with an original audio track of an original speaker, using a second audio dub track of a second speaker, to produce a new audio visual recording with synchronized audio to facial expressive speech of the second audio dub track spoken by the original speaker, comprising analyzing the original audio track to convert it into phonemes as a time-coded phoneme stream to identify corresponding visual facial motions of the original speaker to create frames of facial motion corresponding to speech phoneme utterance states and transformations, storing these frames in a database, analyzing the second audio dub track to convert it to phonemes as a time-coded phoneme stream, using the second audio dub track time-coded phoneme stream to animate the original speaker'"'"'s face, synchronized to the second audio dub track to create natural continuous facial speech expression by the original speaker of the second dub audio track.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

This invention comprises analyzing a speaker'"'"'s speech in an audio visual recording to convert it into triphones and/or phonemes and then using a time coded phoneme stream to identify corresponding visual facial motions, to create single frame snapshots or multi-frame clips of facial motion corresponding to speech phoneme utterance states and transformations, which are stored in a database, and which are subsequently used to animate the original speaker'"'"'s face, synchronized to a new voice track that has been converted into a time-coded, image frame-indexed phoneme stream.

43 Citations

View as Search Results

35 Claims

1. A method for modifying an audio visual recording originally produced with an original audio track of an original speaker, using a second audio dub track of a second speaker, to produce a new audio visual recording with synchronized audio to facial expressive speech of the second audio dub track spoken by the original speaker, comprising analyzing the original audio track to convert it into phonemes as a time-coded phoneme stream to identify corresponding visual facial motions of the original speaker to create frames of facial motion corresponding to speech phoneme utterance states and transformations, storing these frames in a database, analyzing the second audio dub track to convert it to phonemes as a time-coded phoneme stream, using the second audio dub track time-coded phoneme stream to animate the original speaker'"'"'s face, synchronized to the second audio dub track to create natural continuous facial speech expression by the original speaker of the second dub audio track.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
- - 2. The method of claim 1 wherein said second audio dub track is spoken in a language different from that of the original speaker.
  - 3. The method of claim 1 wherein said phonemes comprise diphones.
  - 4. The method of claim 1 wherein said phonemes comprise triphones.
  - 5. The method of claim 1 comprising using a set of fixed facial reference points to track the facial transformation from one phoneme to another phoneme.
  - 6. The method of claim 5 comprising using a computer motion tracking system to record the optical flow path for each fixed reference point.
  - 7. The method of claim 1 further comprising using a set of fixed facial reference points to track the facial transformation from one phoneme to another phoneme.
  - 8. The method of claim 7 further comprising accumulating a database of recorded triphone and diphone mouth transformation optical flow paths.
  - 9. The method of claim 5 further comprising adding an emotional elicitation process by using a computer motion tracking system to record a number of facial control points for each emotion.
  - 10. The method of claim 9 in which the facial control points comprise the chin, outside of the mouth and inside of the lips.
  - 11. The method of claim 7 comprising accumulating a database of visemes by recording the facial control points corresponding to each phoneme.
  - 12. The method of claim 11 comprising accumulating a database of muzzle patches by mapping the facial control points for each viseme.
  - 13. The method of claim 12 comprising selecting muzzle patches from the speaker'"'"'s viseme database based on the second dub audio track and phoneme to viseme sequence and applying the selected muzzle patches onto a three dimensional facial muzzle model.
  - 14. The method of claim 13 further comprising also collecting a set of patches of light, color and texture from the sampled first speaker'"'"'s muzzle patches.
  - 15. The method of claim 1 in which the fixed reference points for both speakers are mapped using radar.
  - 16. The method of claim 16 in which the dub audio track radar measurements are referenced to a particular phoneme or phoneme to phoneme transition in time.
  - 18. The method of claim 17 in which both audio tracks are time stamped to frames to create a database of individual frames for each phoneme.
  - 19. The method of claim 18 comprising using a computer vision system to track and record a database of visemes of the actors'"'"' head position, facial motions of the jaw, and the lip motion during speech for each frame.
  - 20. The method of claim 19 comprising tracking fixed reference control points on the head, jaw and lips.
  - 21. The method of claim 20 in which the reference control points comprise the outside edge of the lips, the inside edge of the lips, the edge of the chin, the tip of the tongue, the bottom edge of the upper teeth, the upper edge of the lower teeth, the nose and the eyes.
  - 22. The method of claim 21 comprising accumulating a database of muzzle patches by mapping the facial control points for each viseme.
  - 23. The method of claim 22 comprising selecting muzzle patches from the speaker'"'"'s viseme database based on the second dub audio track and phoneme to viseme sequence and applying the selected muzzle patches onto a three dimensional facial muzzle model.
  - 24. The method of claim 23 further comprising also collecting a set of patches of light, color and texture from the sampled first speaker'"'"'s muzzle patches and applying these patches based on the second dub audio track.
  - 25. The method of claim 20 in which the fixed reference points for both actors are mapped using radar.
  - 26. The method of claim 25 in which the dub audio track radar measurements are referenced to a particular phoneme or phoneme to phoneme transition in time.
  - 27. The method of claim 25 in which the radar mapping information is transferred to an animation control mixer.
  - 28. The method of claim 25 in which the viseme database information is transferred to an animation control mixer.
  - 29. The methods of claim 1 or 17 comprising using head modeling to aid in auto-positioning three dimensional visemes.
  - 30. The methods of claim 1 or 17 comprising modeling multiple floating point control vertices relative to a generic head shape and relative to different speech viseme standards.
  - 31. The methods of claim 1 or 17 comprising texture sampling target footage to texture match visemes.

17. A method for modifying an audio visual recording originally produced with an original audio track of an original screen actor, using a second audio dub track of a second screen actor, to produce a new audio visual recording with synchronized audio to facial expressive speech of the second audio dub track spoken by the original screen actor, comprising analyzing the original audio track to convert it into phonemes as a time-coded phoneme stream to identify corresponding visual facial motions of the original speaker to create frames of facial motion corresponding to speech phoneme utterance states and transformations, storing these frames in a database, analyzing the second audio dub track to convert it to phonemes as a time-coded phoneme stream, using the second audio dub track time-coded phoneme stream to animate the original screen actor'"'"'s face, synchronized to the second audio dub track to create natural continuous facial speech expression by the original screen actor of the second dub audio track.

32. A method for modifying an audio visual recording originally produced with an original audio track of an original screen actor, using a second audio dub track of a second screen actor, to produce a new audio visual recording with synchronized audio to facial expressive speech of the second audio dub track spoken by the original screen actor, comprising analyzing the original audio track to convert it into phonemes as a time-coded phoneme stream, identifying corresponding visemes of the original screen actor, using radar to measure a set of facial reference points corresponding to speech phoneme utterance states and transformations, storing the data obtained in a database, analyzing the second audio dub track to convert it to phonemes as a time-coded phoneme stream, identifying corresponding visemes of the second screen actor, using radar to measure a set of facial reference points corresponding to speech phoneme utterance states and transformations, storing the data obtained in a database, using the second audio dub track time-coded phoneme stream and the actors visemes to animate the original screen actor'"'"'s face, synchronized to the second audio dub track to create natural continuous facial speech expression by the original screen actor of the second dub audio track.
- View Dependent Claims (33, 34, 35)
- - 33. The method of claim 32 in which the database resides in an animation control mixer.
  - 34. The method of claim 32 in which the facial reference points comprise the outside edge of the lips, the inside edge of the lips, the edge of the chin, the tip of the tongue, the bottom edge of the upper teeth, the upper edge of the lower teeth, the nose and the eyes.
  - 35. The method of claim 32 comprising texture sampling target footage to texture match visemes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
BabelOn LLC
Original Assignee
Andrew Bryant, Dana Taschner, Marcy Hamilton, Rod Schumacher, Steven Wolff, Strath Hamilton, William Scott Moulton
Inventors
Bryant, Andrew, Wolff, Steven, Schumacher, Rod, Hamilton, Strath, Taschner, Dana, Hamilton, Marcy, Moulton, William Scott

Granted Patent

US 6,778,252 B2
Time in Patent Office

Days
Field of Search
US Class Current

352/5
CPC Class Codes

G03B 31/00   Associated working of camer...

G10L 2021/105   Synthesis of the lips movem...

G11B 2220/90   Tape-like record carriers

G11B 27/022   Electronic editing of analo...

G11B 27/024   on tapes G11B27/028, G11B27...

G11B 27/028   with computer assistance

G11B 27/034   on discs G11B27/036, G11B27...

G11B 27/10   Indexing; Addressing; Timin...

Film language

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

43 Citations

35 Claims

Specification

Solutions

Use Cases

Quick Links

Film language

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

43 Citations

35 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links