Method for generating mouth features of an animated or physical character

US 6,067,095 A
Filed: 02/04/1997
Issued: 05/23/2000
Est. Priority Date: 02/04/1997
Status: Expired due to Term

First Claim

Patent Images

1. A method for determining the mouth features for a speaking character, comprising the steps of:

sampling a time-domain audio signal;

separating the time-domain audio signal into a plurality of frames;

applying a window to each of the plurality of frames; and

applying a linear predictive coding (LPC) technique to each of the plurality of frames to achieve a plurality of LPC coefficients and a gain for each of the plurality of frames, whereby the LPC coefficients and gain for each frame are used to determine the mouth features for the character on a frame-by-frame basis.

View all claims

7 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method and system for determining the mouth features, i.e., the lip position and mouth opening, of an animated character. Lip position is the shape and position of the lips of the animated character. Mouth opening is the amount of opening between the lips of the animated character. A time-domain signal corresponding to the speech of the animated character may be digitally sampled. The sampled voice signal is separated into a number of frames of a specific time length. A Hamming window is applied to each frame to de-emphasize the boundary conditions of each frame. A linear predictive coding (LPC) technique is applied to each of the frames, resulting in a gain for each of the frames and a number of k coefficients, or reflection coefficients, including a voiced/nonvoiced coefficient and a pitch coefficient. The reflection coefficients for each frame are mapped to the Cepstral domain resulting in a number of Cepstral coefficients for each frame. The Cepstral coefficients are vector quantized to achieve a vector quantization result representing the character'"'"'s lip position. For a predetermined number of frames, a local maximum and a local minimum of gain are found. The gain for each of the frames containing a local minimum is set to a fully closed mouth opening and the gain for each of the frames containing a local maximum is set to a fully open mouth opening. The vector quantization result and gain are applied to an empirically derived mapping function to determine the mouth features of the character.

Citations

20 Claims

1. A method for determining the mouth features for a speaking character, comprising the steps of:
- sampling a time-domain audio signal;
  
  separating the time-domain audio signal into a plurality of frames;
  
  applying a window to each of the plurality of frames; and
  
  applying a linear predictive coding (LPC) technique to each of the plurality of frames to achieve a plurality of LPC coefficients and a gain for each of the plurality of frames, whereby the LPC coefficients and gain for each frame are used to determine the mouth features for the character on a frame-by-frame basis.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The method recited in claim 1, further comprising the step of:
    - transmitting the LPC coefficients and the gain for each of the frames to the character.
  - 3. The method recited in claim 1, further comprising the steps of:
    - mapping the plurality of LPC coefficients to the Cepstral domain for each frame to obtain a plurality of Cepstral coefficients for each frame;
      
      vector quantizing the Cepstral coefficients to obtain a vector quantization result corresponding to a lip position of the character; and
      
      applying the vector quantization result and the gain for each frame to a mapping function to obtain the mouth features of the character for each frame.
  - 4. The method recited in claim 3 wherein the mapping function is defined by a lookup table.
  - 5. The method recited in claim 3 further comprising the steps of:
    - before applying the vector quantization result and the gain for each frame to the mapping function, determining a plurality of local maxima for gain and a plurality of local minima for gain within a predetermined number of frames;
      
      discarding local maxima which occur too close to the last local minimum;
      
      discarding local minima which occur too close to the last local maximum;
      
      adjusting the gain for a frame containing one of the local minima to equal a minimum gain level;
      
      adjusting the gain for a frame containing one of the local maxima to equal a maximum gain level;
      
      averaging the distance between the local minima and local maxima; and
      
      scaling the gain of all of the frames between the range of minimum gain level to maximum gain level.
  - 6. The method recited in claim 5 wherein the minimum gain level corresponds to a minimum mouth opening for the character and the maximum gain level corresponds to a maximum mouth opening for the character.
  - 7. The method recited in claim 5 further comprising the step of determining a minimum distance between local minima.
  - 8. The method recited in claim 5 further comprising the step of causing the distance between local maxima to be averaged between the closing local minima.
  - 9. The method recited in claim 5 further comprising the step of scaling the gain between the range of fully closed to fully open.
  - 10. A computer-readable medium having computer-readable instructions for performing the steps recited in claim 5.

11. A computer-implemented method for generating mouth features of a character, comprising the steps of:
- sampling a time-domain voice signal;
  
  separating the time-domain voice signal into a plurality of frames;
  
  applying a windowing technique to each frame;
  
  applying a linear predictive coding (LPC) technique to each of the plurality of frames to generate a plurality of LPC coefficients and a gain for each frame;
  
  mapping the plurality of LPC coefficients to the Cepstral domain to obtain a plurality of Cepstral coefficients for each frame;
  
  vector quantizing the Cepstral coefficients to obtain a lip position for each frame;
  
  determining a local maximum of the gain and a local minimum of the gain within a predetermined number of frames;
  
  adjusting the gain for the frame containing the local minimum to equal a minimum gain level;
  
  adjusting the gain for the frame containing the local maximum to equal a maximum gain level; and
  
  applying the lip position and the gain for each frame to an empirically derived mapping function to obtain the mouth features of the character for each frame.
- View Dependent Claims (12, 13, 14, 15)
- - 12. The computer-implemented method recited in claim 11 wherein the step of sampling the time-domain voice signal comprises digitally sampling the time-domain voice signal.
  - 13. The computer-implemented method recited in claim 11 wherein the step of applying a windowing technique to each of the plurality of frames comprises the step of applying a Hamming window to each frame.
  - 14. The computer-implemented method recited in claim 11 wherein the character is a computer-animated character, further comprising the steps of:
    - reproducing the time-domain voice signal through a speaker; and
      
      displaying on a display device the mouth features of the computer-animated character in unison with reproduction of the time-domain voice signal via the speaker.
  - 15. The computer-implemented method recited in claim 11 wherein the character is a mechanical character having a speaker, a pair of lips, and at least one motor for controlling the position of the lips, further comprising the steps of:
    - audibly broadcasting the time-domain voice signal through the speaker; and
      
      activating each motor to move the pair of lips in unison with the time-domain voice signal such that, for each frame of the time-domain voice signal, the pair of lips corresponds to the mouth features obtained through the empirically derived mapping function for the frame of the timedomain voice signal that is being audibly broadcast.

16. A computer system for synchronizing the mouth features of a speaking performer to a voice signal transmitted by the performer, comprising:
- a processor; and
  
  a memory storage device for storing a program module;
  
  the processor, responsive to instructions from the program module, being operative to;
  
  sample the voice signal;
  
  break the voice signal into a number of frames;
  
  apply a windowing technique to each of the frames;
  
  apply a linear predictive coding technique to each frame to obtain a number of reflection coefficients and a gain coefficient for each frame;
  
  transform the reflection coefficients into Cepstral coefficients;
  
  determine a lip position for each frame that corresponds to the Cepstral coefficients for each frame;
  
  adjust the gain of certain frames of the voice signal so that a mouth of the performer fully opens and fully closes within a predetermined number of frames; and
  
  determine the mouth features corresponding to each frame using the gain and lip position for each frame.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The computer system of claim 16 wherein the windowing technique applies a window to each frame to avoid discontinuities of each frame.
  - 18. The computer system of claim 16 wherein the processor is further operative to adjust the gain of certain frames by:
    - determining a local maximum for gain and a local minimum for gain for a predetermined number of frames of the voice signal;
      
      adjusting the gain for the frames containing a local minimum for gain to equal a minimum gain; and
      
      adjusting the gain for the frames containing a local maximum for gain to equal a maximum gain.
  - 19. The computer system of claim 18 wherein the minimum gain corresponds to the mouth of the character being fully open and the maximum gain corresponds to the mouth of the character being fully closed.
  - 20. The computer system of claim 16 wherein the processor is further operative to determine the mouth features corresponding to each frame by:
    - applying the gain and lip position for each frame to a mapping function to obtain data commands corresponding to the mouth features of the performer for each frame;
      
      receiving data commands based upon the mapping function; and
      
      transmitting the data commands to the performer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Musicqubed Innovations LLC (Ascend Innovation Management, LLC)
Original Assignee
Microsoft Corporation
Inventors
Danieli, Damon Vincent
Primary Examiner(s)
Feild, Joseph H.
Assistant Examiner(s)
Kindred, Alford W.

Application Number

US08/795,711
Time in Patent Office

1,204 Days
Field of Search

345/472, 345/473, 345/469, 345/468, 345/474, 345/121, 345/131, 345/136, 345/501, 704/200, 704/262, 704/235, 704/260, 704/275, 704/219, 704/203
US Class Current

345/473
CPC Class Codes

G10L 2021/105 Synthesis of the lips movem...

G10L 21/06 Transformation of speech in...

Method for generating mouth features of an animated or physical character

First Claim

7 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Method for generating mouth features of an animated or physical character

First Claim

7 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links