Method and system for generating facial animation values based on a combination of visual and audio information

US 6,940,454 B2
Filed: 08/13/2001
Issued: 09/06/2005
Est. Priority Date: 04/13/1998
Status: Expired due to Term

First Claim

Patent Images

1. Method for generating facial animation values using a sequence of facial image frames and synchronously captured audio data of a speaking actor, comprising the steps for:

providing a plurality of visual-facial-animation values based on tracking of facial features in the sequence of facial image frames of the speaking actor;

providing a plurality of audio-facial-animation values based on visemes detected using the synchronously captured audio voice data of the speaking actor; and

combining the plurality of visual facial animation values and the plurality of audio facial animation values to generate output facial animation values for use in facial animation.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Facial animation values are generated using a sequence of facial image frames and synchronously captured audio data of a speaking actor. In the technique, a plurality of visual-facial-animation values are provided based on tracking of facial features in the sequence of facial image frames of the speaking actor, and a plurality of audio-facial-animation values are provided based on visemes detected using the synchronously captured audio voice data of the speaking actor. The plurality of visual facial animation values and the plurality of audio facial animation values are combined to generate output facial animation values for use in facial animation.

Citations

20 Claims

1. Method for generating facial animation values using a sequence of facial image frames and synchronously captured audio data of a speaking actor, comprising the steps for:
- providing a plurality of visual-facial-animation values based on tracking of facial features in the sequence of facial image frames of the speaking actor;
  
  providing a plurality of audio-facial-animation values based on visemes detected using the synchronously captured audio voice data of the speaking actor; and
  
  combining the plurality of visual facial animation values and the plurality of audio facial animation values to generate output facial animation values for use in facial animation.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. Method for generating facial animation values as defined in claim 1, wherein the output facial animation values associated with a mouth for a facial animation are based only on the respective mouth-associated values of the plurality of audio facial animation values.
  - 3. Method for generating facial animation values as defined in claim 1, wherein the output facial animation values associated with a mouth for a facial animation are based on a weighted average of the respective mouth-associated values of the plurality of visual facial animation values and the respective mouth-associated values of the plurality of audio facial animation values.
  - 4. Method for generating facial animation values as defined in claim 3, wherein the output facial animation values are calculated using the following equation:
    - ${({\underline{f}}_{n} = \frac{{\underline{σ}}_{n}^{a}}{{\underline{σ}}_{n}^{a} + {\underline{σ}}_{n}^{v}} \cdot {\underline{a}}_{n} + \frac{{\underline{σ}}_{n}^{v}}{{\underline{σ}}_{n}^{v} + {\underline{σ}}_{n}^{a}} \cdot {\underline{v}}_{n})}_{i}$ where;
      
      f_nare the output facial animation values;
      
      v_nare the visual facial animation values;
      
      a_nare the respective mouth-associated values of the audio facial animation values;
      
      σ
      
      _n^aare the weights for the audio facial animation values; and
      
      σ
      
      _n^vare the weights for the visual facial animation values.
  - 5. Method for generating facial animation values as defined in claim 1, wherein the output facial animation values associated with a mouth for a facial animation are based on Kalman filtering of the respective mouth-associated values of the plurality of visual facial animation values and the respective mouth-associated values of the plurality of audio facial animation values.
  - 6. Method for generating facial animation values as defined in claim 1, wherein the step of combining the plurality of visual facial animation values and the plurality of audio facial animation values to generate output facial animation values includes detecting whether speech is occurring in the synchronously captured audio voice data of the speaking actor and, while speech is detected as occurring, generating the output facial animation values associated with a mouth based only on the respective mouth-associated values of the plurality of audio facial animation values and, while speech is not detected as occurring, generating the output facial animation values associated with a mouth based only on the respective mouth-associated values of the plurality of visual facial animation values.
  - 7. Method for generating facial animation values as defined in claim 1, wherein the tracking of facial features in the sequence of facial image frames of the speaking actor is performed using bunch graph matching.
  - 8. Method for generating facial animation values as defined in claim 1, wherein the tracking of facial features in the sequence of facial image frames of the speaking actor is performed using transformed facial image frames generated based on wavelet transformations.
  - 9. Method for generating facial animation values as defined in claim 1, wherein the tracking of facial features in the sequence of facial image frames of the speaking actor is performed using transformed facial image frames generated based on Gabor wavelet transformations.
  - 10. Method for generating facial animation values as defined in claim 1, wherein the tracking of facial features in the sequence of facial image frames of the speaking actor is performed without using markers attached to the speaking actor'"'"'s face.

11. Apparatus for generating facial animation values using a sequence of facial image frames and synchronously captured audio data of a speaking actor, comprising:
- means for providing a plurality of visual-facial-animation values based on tracking of facial features in the sequence of facial image frames of the speaking actor;
  
  means for providing a plurality of audio-facial-animation values based on visemes detected using the synchronously captured audio voice data of the speaking actor; and
  
  means for providing a plurality of visual-facial-animation values based on tracking of facial features in the sequence of facial image frames of the speaking actor;
  
  means for combining the plurality of visual facial animation values and the plurality of audio facial animation values to generate output facial animation values for use in facial animation.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 12. Apparatus for generating facial animation values as defined in claim 11, wherein the output facial animation values associated with a mouth for a facial animation are based only on the respective mouth-associated values of the plurality of audio facial animation values.
  - 13. Apparatus for generating facial animation values as defined in claim 11, wherein the output facial animation values associated with a mouth for a facial animation are based on a weighted average of the respective mouth-associated values of the plurality of visual facial animation values and the respective mouth-associated values of the plurality of audio facial animation values.
  - 14. Apparatus for generating facial animation values as defined in claim 13, wherein the output facial animation values are calculated using the following equation:
    - ${({\underline{f}}_{n} = \frac{{\underline{σ}}_{n}^{a}}{{\underline{σ}}_{n}^{a} + {\underline{σ}}_{n}^{v}} \cdot {\underline{a}}_{n} + \frac{{\underline{σ}}_{n}^{v}}{{\underline{σ}}_{n}^{v} + {\underline{σ}}_{n}^{a}} \cdot {\underline{v}}_{n})}_{i}$ where;
      
      f_nare the output facial animation values;
      
      v_nare the visual facial animation values;
      
      a_nare the respective mouth-associated values of the audio facial animation values;
      
      σ
      
      _n^aare the weights for the audio facial animation values; and
      
      σ
      
      _n^vare the weights for the visual facial animation values.
  - 15. Apparatus for generating facial animation values as defined in claim 11, wherein the output facial animation values associated with a mouth for a facial animation are based on Kalman filtering of the respective mouth-associated values of the plurality of visual facial animation values and the respective mouth-associated values of the plurality of audio facial animation values.
  - 16. Apparatus for generating facial animation values as defined in claim 11, wherein the means for combining the plurality of visual facial animation values and the plurality of audio facial animation values to generate output facial animation values includes means for detecting whether speech is occurring in the synchronously captured audio voice data of the speaking actor and, while speech is detected as occurring, generating the output facial animation values associated with a mouth based only on the respective mouth-associated values of the plurality of audio facial animation values and, while speech is not detected as occurring, generating the output facial animation values associated with a mouth based only on the respective mouth-associated values of the plurality of visual facial animation values.
  - 17. Apparatus for generating facial animation values as defined in claim 11, wherein the tracking of facial features in the sequence of facial image frames of the speaking actor is performed using bunch graph matching.
  - 18. Apparatus for generating facial animation values as defined in claim 11, wherein the tracking of facial features in the sequence of facial image frames of the speaking actor is performed using transformed facial image frames generated based on wavelet transformations.
  - 19. Apparatus for generating facial animation values as defined in claim 11, wherein the tracking of facial features in the sequence of facial image frames of the speaking actor is performed using transformed facial image frames generated based on Gabor wavelet transformations.
  - 20. Apparatus for generating facial animation values as defined in claim 11, wherein the tracking of facial features in the sequence of facial image frames of the speaking actor is performed without using markers attached to the speaking actor'"'"'s face.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Google LLC (Alphabet Inc.)
Original Assignee
Nevengineering Incorporated (Alphabet Inc.)
Inventors
Dzhurinskiy, Yevgeniy V., Paetzold, Frank, Derlich, Karin M., Neven, Hartmut, Buddemeier, Ulrich F.
Primary Examiner(s)
Chen, Shih-Chao
Assistant Examiner(s)
MANCUSO, HUEDUNG XUAN CAO

Application Number

US09/929,516
Publication Number

US 20020118195A1
Time in Patent Office

1,485 Days
Field of Search

345/473, 345/474, 345/475, 345/619, 704/236, 704/235
US Class Current

345/473
CPC Class Codes

G06T 2207/20064   Wavelet transform [DWT]

G06T 7/246   using feature-based methods...

G06T 7/262   using transform domain meth...

G06V 10/426   Graphical representations

G06V 10/449   Biologically inspired filte...

G06V 40/10   Human or animal bodies, e.g...

G06V 40/161   Detection; Localisation; No...

G06V 40/165   using facial parts and geom...

Method and system for generating facial animation values based on a combination of visual and audio information

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method and system for generating facial animation values based on a combination of visual and audio information

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links