Periocular and audio synthesis of a full face image

US 10,565,790 B2
Filed: 11/09/2017
Issued: 02/18/2020
Est. Priority Date: 11/11/2016
Status: Active Grant

First Claim

Patent Images

1. A wearable system for animating a user'"'"'s face during speech, the wearable system comprising:

an inward-facing imaging system configured to capture images, the inward-facing imaging system comprising one or more cameras positioned such that, when the wearable system is worn by the user, a periocular region of a user'"'"'s face is observable by the inward-facing imaging system and the user'"'"'s lower face is unobservable by the inward-facing imaging system;

an audio sensor configured to receive the user'"'"'s speech;

a hardware processor programmed to;

acquire an image, via the inward-facing imaging system when the wearable system is worn by the user, of the periocular region of the user;

generate, based at least partly on the image of the periocular region of the user, periocular face parameters encoding a periocular conformation of at least the periocular region of the user;

acquire, by the audio sensor, an audio stream spoken by the user;

identify a phoneme in the audio stream;

access a base model that was generated using images associated with a group of people not including the user;

customize a mapping based at least in part on the base model and the image of the periocular region of the user,wherein an input of the mapping comprises the phoneme and the image of the periocular region of the user, andwherein an output of the mapping comprises lower face parameters that encode a conformation of the lower face of the user and that are deduced from an analysis of the phoneme and the image of the periocular region of the user;

apply the mapping to the image of the periocular region of the user to generate the lower face parameters;

combine the periocular face parameters and the lower face parameters to generate full face parameters associated with a three-dimensional (3D) face model; and

generate an animation of the user'"'"'s face based at least in part on the full face parameters.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Systems and methods for synthesizing an image of the face by a head-mounted device (HMD) are disclosed. The HMD may not be able to observe a portion of the face. The systems and methods described herein can generate a mapping from a conformation of the portion of the face that is not imaged to a conformation of the portion of the face observed. The HMD can receive an image of a portion of the face and use the mapping to determine a conformation of the portion of the face that is not observed. The HMD can combine the observed and unobserved portions to synthesize a full face image.

Citations

23 Claims

1. A wearable system for animating a user'"'"'s face during speech, the wearable system comprising:
- an inward-facing imaging system configured to capture images, the inward-facing imaging system comprising one or more cameras positioned such that, when the wearable system is worn by the user, a periocular region of a user'"'"'s face is observable by the inward-facing imaging system and the user'"'"'s lower face is unobservable by the inward-facing imaging system;
  
  an audio sensor configured to receive the user'"'"'s speech;
  
  a hardware processor programmed to;
  
  acquire an image, via the inward-facing imaging system when the wearable system is worn by the user, of the periocular region of the user;
  
  generate, based at least partly on the image of the periocular region of the user, periocular face parameters encoding a periocular conformation of at least the periocular region of the user;
  
  acquire, by the audio sensor, an audio stream spoken by the user;
  
  identify a phoneme in the audio stream;
  
  access a base model that was generated using images associated with a group of people not including the user;
  
  customize a mapping based at least in part on the base model and the image of the periocular region of the user,wherein an input of the mapping comprises the phoneme and the image of the periocular region of the user, andwherein an output of the mapping comprises lower face parameters that encode a conformation of the lower face of the user and that are deduced from an analysis of the phoneme and the image of the periocular region of the user;
  
  apply the mapping to the image of the periocular region of the user to generate the lower face parameters;
  
  combine the periocular face parameters and the lower face parameters to generate full face parameters associated with a three-dimensional (3D) face model; and
  
  generate an animation of the user'"'"'s face based at least in part on the full face parameters.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The wearable system of claim 1, wherein the 3D face model comprises a deformable linear model and wherein the periocular face parameters and the lower face parameters describe a deformation of the face when the user is speaking.
  - 3. The wearable system of claim 2, wherein to generate the full face parameters, the hardware processor is programmed to update the 3D face model to reflect an update to at least one of the lower face parameters or the periocular face parameters.
  - 4. The wearable system of claim 1, wherein the input of the mapping further comprises at least one of eye specific information, a body movement, or a heart rate.
  - 5. The wearable system of claim 4, wherein the eye specific information comprises at least one of:
    - an eye pose, a pupil dilation state, an eye color, or an eyelid state of the user.
  - 6. The wearable system of claim 1, wherein the lower face parameters encode visemes which visually describe phonemes in the audio stream.
  - 7. The wearable system of claim 1, wherein to customize the mapping, the hardware processor is programmed to infer a skin texture of the face of the user based at least partly on the image of the periocular region, wherein the animation of the user'"'"'s face incorporates the skin texture of the face.
  - 8. The wearable system of claim 1, wherein the inward-facing imaging system comprises an eye camera and the image of the periocular region acquired by the inward-facing imaging system comprises an image of the periocular region for a first eye.
  - 9. The wearable system of claim 8, wherein to generate the full face parameters, the hardware processor is programmed to:
    - determine periocular face parameters for a second eye based on the image of the periocular region acquired by the inward-facing imaging system; and
      
      incorporate the periocular face parameters for the second eye into the full face parameters.
  - 10. The wearable system of claim 1, wherein to cause the full face parameters to be applied to generate the animation of the user'"'"'s face, the hardware processor is programmed to communicate instructions to a wearable device comprising a mixed reality display wherein the instructions cause the full face parameters to be applied to vary the 3D face model from a neutral position.
  - 11. The wearable system of claim 1, wherein the base model comprises a 3D deformable linear model generated based on images of the group of people.
  - 12. The wearable system of claim 1, wherein the lower face of the user is unobserved by any camera associated with the wearable system when the wearable system is worn by the user.

13. A method for animating a user'"'"'s face during speech, the method comprising:
- accessing an image of the periocular region of a user acquired by an inward-facing imaging system configured to capture images, the inward-facing imaging system comprising one or more cameras positioned such that, when the wearable system is worn by the user, a periocular region of the user'"'"'s face is observable by the inward-facing imaging system and the user'"'"'s lower face is unobservable by the inward-facing imaging system;
  
  determining, based at least partly on the images, periocular face parameters encoding a periocular conformation of at least the periocular region of the user;
  
  accessing an audio stream spoken by the user acquired by an audio sensor;
  
  identifying a phoneme in the audio stream;
  
  accessing a base model that was generated using images associated with a group of people not including the user;
  
  customizing a mapping based at least in part on the base model and the image of the periocular region of the user,wherein an input of the mapping comprises the phoneme and the image of the periocular region of the user, andwherein an output of the mapping comprises lower face parameters that encode a conformation of the lower face of the user, and that are deduced from an analysis of the phoneme and the image of the periocular region of the user;
  
  applying the mapping to the image to generate the lower face parameters;
  
  combining the periocular face parameters and the lower face parameters to generate full face parameters associated with a three-dimensional (3D) face model; and
  
  generating a full face image based at least partly on the full face parameters.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
- - 14. The method of claim 13, wherein the 3D face model comprises a deformable linear model and wherein the periocular face parameters and the lower face parameters describe a deformation of the face when the user is speaking.
  - 15. The method of claim 14, wherein to generate the full face parameters, the hardware processor is programmed to update the 3D face model to reflect an update to at least one of the lower face parameters or the periocular face parameters.
  - 16. The method of claim 13, wherein the full face parameters are combined with eye specific information to determine an animation associated with the user'"'"'s face.
  - 17. The method of claim 16, wherein the eye specific information comprises at least one of:
    - an eye pose, a pupil dilation state, an eye color, or an eyelid state of the user.
  - 18. The method of claim 13, wherein the lower face parameters encode visemes which visually describe phonemes in the audio stream.
  - 19. The method of claim 13, wherein the full face image further incorporates skin textures of the user which are determined based at least partly on the image acquired by the inward-facing imaging system.
  - 20. The method of claim 13, wherein the mapping comprises a likelihood that a periocular face parameter is associated with a lower face parameter, and the lower face parameters are selected to generate the full face image in response to a determination that they pass threshold criteria.
  - 21. The method of claim 13, wherein the image comprise at least one of a still image or a video frame.
  - 22. The method of claim 13, further comprising instructing a head-mounted display to render the full face image in a mixed reality environment.
  - 23. The method of claim 13, wherein the base model comprises a 3D deformable linear model generated based on images of the group of people.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Magic Leap, Inc.
Original Assignee
Magic Leap, Inc.
Inventors
Kaehler, Adrian
Primary Examiner(s)
Wilson, Nicholas R

Application Number

US15/808,516
Publication Number

US 20180137678A1
Time in Patent Office

831 Days
Field of Search

None
US Class Current
CPC Class Codes

G02B 2027/0138   comprising image capture sy...

G02B 2027/014   comprising information/imag...

G02B 2027/0178   Eyeglass type eyeglass deta...

G02B 27/0093   with means for monitoring d...

G02B 27/017   Head mounted

G02B 27/0172   characterised by optical fe...

G06F 2203/011   Emotion or mood input deter...

G06F 3/011   Arrangements for interactio...

G06F 3/012   Head tracking input arrange...

G06F 3/013   Eye tracking input arrangem...

G06T 13/40   of characters, e.g. humans,...

G06T 17/20   Finite element generation, ...

G06T 19/006   Mixed reality object pose d...

G06V 40/169   Holistic features and repre...

G06V 40/171   Local features and componen...

G06V 40/193   Preprocessing; Feature extr...

G10L 2015/025   Phonemes, fenemes or fenone...

G10L 2021/105   Synthesis of the lips movem...

G10L 21/10   Transforming into visible i...

Periocular and audio synthesis of a full face image

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Periocular and audio synthesis of a full face image

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links