Markerless face tracking with synthetic priors

US 10,147,023 B1
Filed: 10/20/2016
Issued: 12/04/2018
Est. Priority Date: 10/20/2016
Status: Active Grant

First Claim

Patent Images

1. A method for markerless face tracking, the method comprising:

obtaining a facial rig associated with a subject, wherein the facial rig includes a plurality of expression shapes, wherein an expression shape defines at least a portion of an expression of the subject and includes one or more values for one or more facial attributes;

generating a plurality of model states for the facial rig, wherein a model state describes a combination of expression shapes defining an expression of the subject and a set of camera setup location coordinates in relation to the subject;

determining a lighting characteristic to use for rendering a computer-generated image of a model state of the plurality of model states;

rendering a plurality of computer-generated images of a face of the subject, wherein a computer-generated image is rendered using the lighting characteristic and a corresponding model state of the facial rig;

generating a plurality of training samples, wherein a training sample includes a computer generated image and a corresponding model state; and

training a regressor using the plurality of training samples, wherein the trained regressor is configured to infer a model state that corresponds to the face of the subject captured in a frame.

View all claims

5 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Provided are methods, systems, and computer-readable medium for synthetically generating training data to be used to train a learning algorithm that is capable of generating computer-generated images of a subject from real images that include the subject. The training data can be generated using a facial rig by changing expressions, camera viewpoints, and illumination in the training data. The training data can then be used for tracking faces in a real-time video stream. In such examples, the training data can be tuned to expected environmental conditions and camera properties of the real-time video stream. Provided herein are also strategies to improve training set construction by analyzing which attributes of a computer-generated image (e.g., expression, viewpoint, and illumination) require denser sampling.

Citations

21 Claims

1. A method for markerless face tracking, the method comprising:
- obtaining a facial rig associated with a subject, wherein the facial rig includes a plurality of expression shapes, wherein an expression shape defines at least a portion of an expression of the subject and includes one or more values for one or more facial attributes;
  
  generating a plurality of model states for the facial rig, wherein a model state describes a combination of expression shapes defining an expression of the subject and a set of camera setup location coordinates in relation to the subject;
  
  determining a lighting characteristic to use for rendering a computer-generated image of a model state of the plurality of model states;
  
  rendering a plurality of computer-generated images of a face of the subject, wherein a computer-generated image is rendered using the lighting characteristic and a corresponding model state of the facial rig;
  
  generating a plurality of training samples, wherein a training sample includes a computer generated image and a corresponding model state; and
  
  training a regressor using the plurality of training samples, wherein the trained regressor is configured to infer a model state that corresponds to the face of the subject captured in a frame.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein a model state of the plurality of model states is generated by modifying one or more expression shapes, and wherein the one or more expression shapes are modified to correspond to an expression of the face of the subject.
  - 3. The method of claim 1, wherein a model state of the plurality of model states is generated by modifying one or more coordinates of a set of camera coordinates, and wherein the one or more coordinates of the set of camera coordinates represent a deviation from a camera rest pose.
  - 4. The method of claim 1, wherein a lighting characteristic used for rendering a first computer-generated image of a first model state of the plurality of model states is different than a lighting characteristic used for rendering a second computer-generated image of the first model state of the plurality of model states.
  - 5. The method of claim 1, further comprising:
    - determining a default model state for the facial rig, wherein a model state of the plurality of model states includes a modification of the default model state.
  - 6. The method of claim 1, wherein the regressor is trained by determining transitions for the training samples, wherein a transition predicts a first model state in a first frame and a second model state in a second frame, and wherein the first frame precedes the second frame.
  - 7. The method of claim 1, wherein the frame is a current frame of an image stream.
  - 8. The method of claim 1, wherein the plurality of model states are generated based on one or more expected conditions of the frame, and wherein the lighting characteristic is determined based on the one or more expected conditions.
  - 9. The method of claim 1, further comprising applying the trained regressor to infer the model state that corresponds to the face of the subject captured in the frame.

10. A system for face tracking, the system comprising:
- a memory storing a plurality of instructions; and
  
  one or more processors configurable to;
  
  obtain a facial rig associated with a subject, wherein the facial rig includes a plurality of expression shapes, wherein an expression shape defines at least a portion of an expression of the subject and includes one or more values for one or more facial attributes;
  
  generate a plurality of model states for the facial rig, wherein a model state describes a combination of expression shapes defining an expression of the subject and a set of camera setup location coordinates in relation to the subject;
  
  determine a lighting characteristic to use for rendering a computer-generated image of a model state of the plurality of model states;
  
  render a plurality of computer-generated images of a face of the subject, wherein a computer-generated image is rendered using the lighting characteristic and a corresponding model state of the facial rig;
  
  generate a plurality of training samples, wherein a training sample includes a computer-generated image and a corresponding model state; and
  
  train a regressor using the plurality of training samples, wherein the trained regressor is configured to infer a model state that corresponds to the face of the subject captured in a frame.
- View Dependent Claims (11, 12, 13, 14, 15)
- - 11. The system of claim 10, wherein a model state of the plurality of model states is generated by modifying one or more expression shapes, and wherein the one or more expression shapes are modified to correspond to an expression of the face of the subject.
  - 12. The system of claim 10, wherein a model state of the plurality of model states is generated by modifying one or more coordinates of a set of camera coordinates, and wherein the one or more coordinates of the set of camera coordinates represent a deviation from a camera rest pose.
  - 13. The system of claim 10, wherein a lighting characteristic used for rendering a first computer-generated image of a first model state of the plurality of model states is different than a lighting characteristic used for rendering a second computer-generated image of the first model state of the plurality of model states.
  - 14. The system of claim 10, wherein the regressor is trained by determining transitions for the training samples, wherein a transition predicts a first model state in a first frame and a second model state in a second frame, and wherein the first frame precedes the second frame.
  - 15. The system of claim 10, wherein the plurality of model states are generated based on one or more expected conditions of the frame, and wherein the lighting characteristic is determined based on the one or more expected conditions.

16. A computer-readable memory storing a plurality of instructions executable by one or more processors, the plurality of instructions comprising instructions that cause the one or more processors to:
- obtain a facial rig associated with a subject, wherein the facial rig includes a plurality of expression shapes, wherein an expression shape defines at least a portion of an expression of the subject and includes one or more values for one or more facial attributes;
  
  generate a plurality of model states for the facial rig, wherein a model state describes a combination of expression shapes defining an expression of the subject and a set of camera setup location coordinates in relation to the subject;
  
  determine a lighting characteristic to use for rendering a computer-generated image of a model state of the plurality of model states;
  
  render a plurality of computer-generated images of a face of the subject, wherein a computer-generated image is rendered using the lighting characteristic and a corresponding model state of the facial rig;
  
  generate a plurality of training samples, wherein a training sample includes a computer-generated image and a corresponding model state; and
  
  train a regressor using the plurality of training samples, wherein the trained regressor is configured to infer a model state that corresponds to the face of the subject captured in a frame.
- View Dependent Claims (17, 18, 19, 20, 21)
- - 17. The computer-readable memory of claim 16, wherein a model state of the plurality of model states is generated by modifying one or more expression shapes, and wherein the one or more expression shapes are modified to correspond to an expression of the face of the subject.
  - 18. The computer-readable memory of claim 16, wherein a model state of the plurality of model states is generated by modifying one or more coordinates of a set of camera coordinates, and wherein the one or more coordinates of the set of camera coordinates represent a deviation from a camera rest pose.
  - 19. The computer-readable memory of claim 16, wherein a lighting characteristic used for rendering a first computer-generated image of a first model state of the plurality of model states is different than a lighting characteristic used for rendering a second computer-generated image of the first model state of the plurality of model states.
  - 20. The computer-readable memory of claim 16, wherein the regressor is trained by determining transitions for the training samples, wherein a transition predicts a first model state in a first frame and a second model state in a second frame, and wherein the first frame precedes the second frame.
  - 21. The computer-readable memory of claim 16, wherein the plurality of model states are generated based on one or more expected conditions of the frame, and wherein the lighting characteristic is determined based on the one or more expected conditions.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Disney Enterprises Incorporated (The Walt Disney Company)
Original Assignee
Disney Enterprises Incorporated (The Walt Disney Company)
Inventors
Klaudiny, Martin, McDonagh, Steven, Bradley, Derek, Beeler, Thabo, Matthews, Iain, Mitchell, Kenneth
Primary Examiner(s)
CHAKRABORTY, RAJARSHI

Application Number

US15/299,221
Time in Patent Office

775 Days
Field of Search

381103
US Class Current
CPC Class Codes

G06F 18/214   Generating training pattern...

G06T 13/40   of characters, e.g. humans,...

G06T 15/50   Lighting effects

G06T 2207/20081   Training; Learning

G06T 2207/30201   Face

G06T 7/251   involving models

G06V 10/774   Generating sets of training...

G06V 40/167   using comparisons between t...

G06V 40/176   Dynamic expression

Markerless face tracking with synthetic priors

First Claim

5 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Markerless face tracking with synthetic priors

First Claim

5 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links