Animating speech of an avatar representing a participant in a mobile communication

US 8,125,485 B2
Filed: 11/20/2009
Issued: 02/28/2012
Est. Priority Date: 10/11/2007
Status: Active Grant

First Claim

Patent Images

1. A method of animating speech of an avatar representing a participant in a mobile communication, the method comprising:

selecting, by a computer, from data storage, one or more images to represent the participant;

selecting, by the computer, from data storage, a generic animation template for the participant, the generic animation template having a mouth and at least one emotive feature, the mouth characterized by a mouth position;

fitting, by the computer, the one or more images with the generic animation template;

texture wrapping, by the computer, the one or more images over the generic animation template;

displaying, by the computer, the one or more images texture wrapped over the generic animation template;

receiving, by the computer, an audio speech signal derived from the mobile communication of the participant;

identifying, by the computer, from the audio speech signal, a series of phonemes and one or more points of voice inflection greater than a predetermined threshold, each phoneme in the series of phonemes representing a portion of the audio speech signal;

for each phoneme in the series of phonemes;

identifying, by the computer, a new mouth position for the mouth of the generic animation template;

altering, by the computer, the mouth position of the mouth of the generic animation template to the new mouth position;

texture wrapping, by the computer, a portion of the one or more images corresponding to the altered mouth position of the mouth of the generic animation template;

displaying, by the computer, the texture wrapped portion of the one or more images corresponding to the altered mouth position of the mouth of the generic animation template; and

playing, by the computer, synchronously with the displayed texture wrapped portion of the one or more images, the portion of the audio speech signal represented by the phoneme; and

for each point of voice inflection of the one or more points of inflection greater than the predetermined threshold, triggering, by the computer, a motion key-frame caption that alters display of the at least one emotive feature synchronously with playing, by the computer, a portion of the audio speech signal including the point of voice inflection greater than the predetermined threshold.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Animating speech of an avatar representing a participant in a mobile communication including selecting one or more images; selecting a generic animation template; fitting the one or more images with the generic animation template; texture wrapping the one more images over the generic animation template; and displaying the one or more images texture wrapped over the generic animation template. Receiving an audio speech signal; identifying a series of phonemes; and for each phoneme: identifying a new mouth position for the mouth of the generic animation template; altering the mouth position to the new mouth position; texture wrapping a portion of the one or more images corresponding to the altered mouth position; displaying the texture wrapped portion of the one or more images corresponding to the altered mouth position of the mouth of the generic animation template; and playing the portion of the audio speech signal represented by the phoneme.

111 Citations

View as Search Results

18 Claims

1. A method of animating speech of an avatar representing a participant in a mobile communication, the method comprising:
- selecting, by a computer, from data storage, one or more images to represent the participant;
  
  selecting, by the computer, from data storage, a generic animation template for the participant, the generic animation template having a mouth and at least one emotive feature, the mouth characterized by a mouth position;
  
  fitting, by the computer, the one or more images with the generic animation template;
  
  texture wrapping, by the computer, the one or more images over the generic animation template;
  
  displaying, by the computer, the one or more images texture wrapped over the generic animation template;
  
  receiving, by the computer, an audio speech signal derived from the mobile communication of the participant;
  
  identifying, by the computer, from the audio speech signal, a series of phonemes and one or more points of voice inflection greater than a predetermined threshold, each phoneme in the series of phonemes representing a portion of the audio speech signal;
  
  for each phoneme in the series of phonemes;
  
  identifying, by the computer, a new mouth position for the mouth of the generic animation template;
  
  altering, by the computer, the mouth position of the mouth of the generic animation template to the new mouth position;
  
  texture wrapping, by the computer, a portion of the one or more images corresponding to the altered mouth position of the mouth of the generic animation template;
  
  displaying, by the computer, the texture wrapped portion of the one or more images corresponding to the altered mouth position of the mouth of the generic animation template; and
  
  playing, by the computer, synchronously with the displayed texture wrapped portion of the one or more images, the portion of the audio speech signal represented by the phoneme; and
  
  for each point of voice inflection of the one or more points of inflection greater than the predetermined threshold, triggering, by the computer, a motion key-frame caption that alters display of the at least one emotive feature synchronously with playing, by the computer, a portion of the audio speech signal including the point of voice inflection greater than the predetermined threshold.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 11, 12, 13, 14)
- - 2. The method of claim 1 wherein the at least one emotive feature comprises one or more facial features.
  - 3. The method of claim 2 wherein the triggering the motion key-frame caption comprises altering display of the one or more facial features.
  - 4. The method of claim 1 wherein the at least one emotive feature comprises one or more non-facial features.
  - 5. The method of claim 4 wherein the triggering the motion key-frame caption comprises altering display of the one or more non-facial features.
  - 6. The method of claim 1 wherein the fitting the one or more images with the generic animation template comprises:
    - resizing the one or more images to conform with the size of the generic animation template;
      
      identifying specific facial features of the one or more images;
      
      aligning the specific facial features with corresponding facial features of the generic animation template; and
      
      reshaping the generic animation template to conform with the one or more images.
  - 7. The method of claim 1 wherein the identifying the new mouth position for the mouth of the generic animation template comprises retrieving coordinates of the new mouth position from a data structure in dependence upon an identification of the phoneme.
  - 11. The method of claim 2 wherein the one or more facial features include at least one of an ear, a dimple, an eyebrow, a brow, a chin, and a nose.
  - 12. The method of claim 3 wherein the altering display of the one or more facial features includes at least one of moving at least one ear, depressing at least one dimple, raising at least one eyebrow, furrowing a brow, moving a chin, and wrinkling a nose.
  - 13. The method of claim 4 wherein the one or more non-facial features include at least one of a finger, a hand, an arm, a shoulder, and a chest.
  - 14. The method of claim 5 wherein the altering display of the one or more non-facial features includes at least one of moving one or more fingers, waving one or more hands, raising one or more arms, shrugging one or more shoulders, and expanding and retracting the chest.

8. A method of animating speech of an avatar representing a participant in a mobile communication, the method comprising:
- selecting, by a computer, from data storage, one or more images to represent the participant;
  
  selecting, by the computer, from data storage, a generic animation template for the participant, the generic animation template having a mouth, the mouth characterized by a mouth position;
  
  fitting, by the computer, the one or more images with the generic animation template;
  
  texture wrapping, by the computer, the one or more images over the generic animation template;
  
  displaying, by the computer, the one or more images texture wrapped over the generic animation template; and
  
  receiving, by the computer, an audio speech signal derived from the mobile communication of the participant;
  
  identifying, by the computer, a vocal pattern from a particular portion of the audio speech signal;
  
  determining, by the computer, whether the vocal pattern matches a predetermined vocal pattern;
  
  identifying, by the computer, from the audio speech signal, a series of phonemes, each phoneme in the series of phonemes representing a portion of the audio speech signal;
  
  for each phoneme in the series of phonemes;
  
  identifying, by the computer, a new mouth position for the mouth of the generic animation template;
  
  altering, by the computer, the mouth position of the mouth of the generic animation template to the new mouth position;
  
  texture wrapping, by the computer, a portion of the one or more images corresponding to the altered mouth position of the mouth of the generic animation template;
  
  displaying, by the computer, the texture wrapped portion of the one or more images corresponding to the altered mouth position of the mouth of the generic animation template; and
  
  playing, by the computer, synchronously with the displayed texture wrapped portion of the one or more images, the portion of the audio speech signal represented by the phoneme; and
  
  if the vocal pattern of the particular portion of the audio speech signal matches the predetermined vocal pattern, displaying, by the computer, an indication of the predetermined vocal pattern synchronously with playing, by the computer, the particular portion of the audio speech signal.
- View Dependent Claims (9, 10)
- - 9. The method of claim 8 wherein the predetermined vocal pattern comprises one of:
    - a vocal pattern used to authenticate a speaker;
      
      a vocal pattern indicating a lying speaker;
      
      a vocal pattern indicating a sad speaker;
      
      a vocal pattern indicating a frightened speaker;
      
      a vocal pattern indicating a happy speaker;
      
      a vocal pattern indicating an excited speaker; and
      
      a vocal pattern indicating a drowsy speaker.
  - 10. The method of claim 8 wherein:
    - the identifying the vocal pattern from the particular portion of the audio speech signal comprises converting the audio speech signal from an analog audio signal to a digital audio signal and analyzing the digital audio signal to identify the vocal pattern; and
      
      the determining whether the vocal pattern matches the predetermined vocal pattern comprises comparing the identified vocal pattern to a plurality of predetermined vocal patterns stored in a repository of predetermined vocal patterns.

15. A system for animating speech of an avatar representing a participant in a mobile communication, the system configured to display the avatar on a display screen of a mobile communications device, the system comprising:
- one or more processors, one or more computer-readable memories, and one or more computer-readable tangible storage devices;
  
  program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to select, from data storage, one or more images to represent the participant;
  
  program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to select, from data storage, a generic animation template for the participant, the generic animation template having a mouth and at least one emotive feature, the mouth characterized by a mouth position;
  
  program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to fit the one or more images with the generic animation template;
  
  program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to texture wrap the one or more images over the generic animation template;
  
  program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to display the one or more images texture wrapped over the generic animation template;
  
  program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to receive an audio speech signal derived from the mobile communication of the participant;
  
  program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to identify, from the audio speech signal, a series of phonemes and one or more points of voice inflection greater than a predetermined threshold, each phoneme in the series of phonemes representing a portion of the audio speech signal;
  
  program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to, for each phoneme in the series of phonemes;
  
  identify a new mouth position for the mouth of the generic animation template;
  
  alter the mouth position of the mouth of the generic animation template to the new mouth position;
  
  texture wrap a portion of the one or more images corresponding to the altered mouth position of the mouth of the generic animation template;
  
  display the texture wrapped portion of the one or more images corresponding to the altered mouth position of the mouth of the generic animation template; and
  
  play synchronously with the displayed texture wrapped portion of the one or more images, the portion of the audio speech signal represented by the phoneme; and
  
  program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to, for each point of voice inflection of the one or more points of inflection greater than the predetermined threshold, trigger a motion key-frame caption that alters display of at least one emotive feature synchronously with playing a portion of the audio speech signal including the point of voice inflection greater than the predetermined threshold.

16. A computer program product for animating speech of an avatar representing a participant in a mobile communication, the computer program product comprising:
- one or more computer-readable tangible storage devices;
  
  program instructions, stored on at least one of the one or more storage devices, to select, from data storage, one or more images to represent the participant;
  
  program instructions, stored on at least one of the one or more storage devices, to select, from data storage, a generic animation template for the participant, the generic animation template having a mouth and at least one emotive feature, the mouth characterized by a mouth position;
  
  program instructions, stored on at least one of the one or more storage devices, to fit the one or more images with the generic animation template;
  
  program instructions, stored on at least one of the one or more storage devices, to texture wrap the one or more images over the generic animation template;
  
  program instructions, stored on at least one of the one or more storage devices, to display the one or more images texture wrapped over the generic animation template;
  
  program instructions, stored on at least one of the one or more storage devices, to receive an audio speech signal derived from the mobile communication of the participant;
  
  program instructions, stored on at least one of the one or more storage devices, to identify from the audio speech signal, a series of phonemes and one or more points of voice inflection greater than a predetermined threshold, each phoneme in the series of phonemes representing a portion of the audio speech signal;
  
  program instructions, stored on at least one of the one or more storage devices, to, for each phoneme in the series of phonemes;
  
  identify a new mouth position for the mouth of the generic animation template;
  
  alter the mouth position of the mouth of the generic animation template to the new mouth position;
  
  texture wrap a portion of the one or more images corresponding to the altered mouth position of the mouth of the generic animation template;
  
  display the texture wrapped portion of the one or more images corresponding to the altered mouth position of the mouth of the generic animation template; and
  
  play synchronously with the displayed texture wrapped portion of the one or more images, the portion of the audio speech signal represented by the phoneme; and
  
  program instructions, stored on at least one of the one or more storage devices, to, for each point of voice inflection of the one or more points of inflection greater than the predetermined threshold, trigger a motion key-frame caption that alters display of at least one emotive feature synchronously with playing a portion of the audio speech signal including the point of voice inflection greater than the predetermined threshold.

17. A system for animating speech of an avatar representing a participant in a mobile communication, the system configured to display the avatar on a display screen of a mobile communications device, the system comprising:
- one or more processors, one or more computer-readable memories, and one or more computer-readable tangible storage devices;
  
  program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to select, from data storage, one or more images to represent the participant;
  
  program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to select, from data storage, a generic animation template for the participant, the generic animation template having a mouth, the mouth characterized by a mouth position;
  
  program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to fit the one or more images with the generic animation template;
  
  program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to texture wrap the one or more images over the generic animation template;
  
  program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to display the one or more images texture wrapped over the generic animation template;
  
  program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to receive an audio speech signal derived from the mobile communication of the participant;
  
  program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to identify a vocal pattern from a particular portion of the audio speech signal;
  
  program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to determine whether the vocal pattern matches a predetermined vocal pattern;
  
  program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to identify, from the audio speech signal, a series of phonemes, each phoneme in the series of phonemes representing a portion of the audio speech signal;
  
  program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to, for each phoneme in the series of phonemes;
  
  identify a new mouth position for the mouth of the generic animation template;
  
  alter the mouth position of the mouth of the generic animation template to the new mouth position;
  
  texture wrap a portion of the one or more images corresponding to the altered mouth position of the mouth of the generic animation template;
  
  display the texture wrapped portion of the one or more images corresponding to the altered mouth position of the mouth of the generic animation template; and
  
  play synchronously with the displayed texture wrapped portion of the one or more images, the portion of the audio speech signal represented by the phoneme; and
  
  program instructions, stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, to, if the vocal pattern of the particular portion of the audio speech signal matches the predetermined vocal pattern, display an indication of the predetermined vocal pattern synchronously with playing the particular portion of the audio speech signal.

18. A computer program product for animating speech of an avatar representing a participant in a mobile communication, the computer program product comprising:
- one or more computer-readable tangible storage devices;
  
  program instructions, stored on at least one of the one or more storage devices, to select, from data storage, one or more images to represent the participant;
  
  program instructions, stored on at least one of the one or more storage devices, to select, from data storage, a generic animation template for the participant, the generic animation template having a mouth, the mouth characterized by a mouth position;
  
  program instructions, stored on at least one of the one or more storage devices, to fit the one or more images with the generic animation template;
  
  program instructions, stored on at least one of the one or more storage devices, to texture wrap the one or more images over the generic animation template;
  
  program instructions, stored on at least one of the one or more storage devices, to display the one or more images texture wrapped over the generic animation template;
  
  program instructions, stored on at least one of the one or more storage devices, to receive an audio speech signal derived from the mobile communication of the participant;
  
  program instructions, stored on at least one of the one or more storage devices, to identify a vocal pattern from a particular portion of the audio speech signal;
  
  program instructions, stored on at least one of the one or more storage devices, to determine whether the vocal pattern matches a predetermined vocal pattern;
  
  program instructions, stored on at least one of the one or more storage devices, to identify from the audio speech signal, a series of phonemes, each phoneme in the series of phonemes representing a portion of the audio speech signal;
  
  program instructions, stored on at least one of the one or more storage devices, to, for each phoneme in the series of phonemes;
  
  identify a new mouth position for the mouth of the generic animation template;
  
  alter the mouth position of the mouth of the generic animation template to the new mouth position;
  
  texture wrap a portion of the one or more images corresponding to the altered mouth position of the mouth of the generic animation template;
  
  display the texture wrapped portion of the one or more images corresponding to the altered mouth position of the mouth of the generic animation template; and
  
  play synchronously with the displayed texture wrapped portion of the one or more images, the portion of the audio speech signal represented by the phoneme; and
  
  program instructions, stored on at least one of the one or more storage devices, to, if the vocal pattern of the particular portion of the audio speech signal matches the predetermined vocal pattern, display an indication of the predetermined vocal pattern synchronously with playing the particular portion of the audio speech signal.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Activision Publishing Incorporated (Microsoft Corporation)
Original Assignee
International Business Machines Corporation
Inventors
Brown, William A., Muirhead, Richard W., Reddington, Francis X., Wolfe, Martin A.
Primary Examiner(s)
HAJNIK, DANIEL F

Application Number

US12/622,553
Publication Number

US 20100060647A1
Time in Patent Office

830 Days
Field of Search

None
US Class Current

345/473
CPC Class Codes

G06T 13/205   driven by audio data

G06T 13/40   of characters, e.g. humans,...

Y10S 345/956   Language driven animation

Y10S 345/957   Actor

Animating speech of an avatar representing a participant in a mobile communication

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

111 Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Animating speech of an avatar representing a participant in a mobile communication

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

111 Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links