Markerless face tracking with synthetic priors
First Claim
1. A method for markerless face tracking, the method comprising:
- obtaining a facial rig associated with a subject, wherein the facial rig includes a plurality of expression shapes, wherein an expression shape defines at least a portion of an expression of the subject and includes one or more values for one or more facial attributes;
generating a plurality of model states for the facial rig, wherein a model state describes a combination of expression shapes defining an expression of the subject and a set of camera setup location coordinates in relation to the subject;
determining a lighting characteristic to use for rendering a computer-generated image of a model state of the plurality of model states;
rendering a plurality of computer-generated images of a face of the subject, wherein a computer-generated image is rendered using the lighting characteristic and a corresponding model state of the facial rig;
generating a plurality of training samples, wherein a training sample includes a computer generated image and a corresponding model state; and
training a regressor using the plurality of training samples, wherein the trained regressor is configured to infer a model state that corresponds to the face of the subject captured in a frame.
5 Assignments
0 Petitions
Accused Products
Abstract
Provided are methods, systems, and computer-readable medium for synthetically generating training data to be used to train a learning algorithm that is capable of generating computer-generated images of a subject from real images that include the subject. The training data can be generated using a facial rig by changing expressions, camera viewpoints, and illumination in the training data. The training data can then be used for tracking faces in a real-time video stream. In such examples, the training data can be tuned to expected environmental conditions and camera properties of the real-time video stream. Provided herein are also strategies to improve training set construction by analyzing which attributes of a computer-generated image (e.g., expression, viewpoint, and illumination) require denser sampling.
-
Citations
21 Claims
-
1. A method for markerless face tracking, the method comprising:
-
obtaining a facial rig associated with a subject, wherein the facial rig includes a plurality of expression shapes, wherein an expression shape defines at least a portion of an expression of the subject and includes one or more values for one or more facial attributes; generating a plurality of model states for the facial rig, wherein a model state describes a combination of expression shapes defining an expression of the subject and a set of camera setup location coordinates in relation to the subject; determining a lighting characteristic to use for rendering a computer-generated image of a model state of the plurality of model states; rendering a plurality of computer-generated images of a face of the subject, wherein a computer-generated image is rendered using the lighting characteristic and a corresponding model state of the facial rig; generating a plurality of training samples, wherein a training sample includes a computer generated image and a corresponding model state; and training a regressor using the plurality of training samples, wherein the trained regressor is configured to infer a model state that corresponds to the face of the subject captured in a frame. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for face tracking, the system comprising:
-
a memory storing a plurality of instructions; and one or more processors configurable to; obtain a facial rig associated with a subject, wherein the facial rig includes a plurality of expression shapes, wherein an expression shape defines at least a portion of an expression of the subject and includes one or more values for one or more facial attributes; generate a plurality of model states for the facial rig, wherein a model state describes a combination of expression shapes defining an expression of the subject and a set of camera setup location coordinates in relation to the subject; determine a lighting characteristic to use for rendering a computer-generated image of a model state of the plurality of model states; render a plurality of computer-generated images of a face of the subject, wherein a computer-generated image is rendered using the lighting characteristic and a corresponding model state of the facial rig; generate a plurality of training samples, wherein a training sample includes a computer-generated image and a corresponding model state; and train a regressor using the plurality of training samples, wherein the trained regressor is configured to infer a model state that corresponds to the face of the subject captured in a frame. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A computer-readable memory storing a plurality of instructions executable by one or more processors, the plurality of instructions comprising instructions that cause the one or more processors to:
-
obtain a facial rig associated with a subject, wherein the facial rig includes a plurality of expression shapes, wherein an expression shape defines at least a portion of an expression of the subject and includes one or more values for one or more facial attributes; generate a plurality of model states for the facial rig, wherein a model state describes a combination of expression shapes defining an expression of the subject and a set of camera setup location coordinates in relation to the subject; determine a lighting characteristic to use for rendering a computer-generated image of a model state of the plurality of model states; render a plurality of computer-generated images of a face of the subject, wherein a computer-generated image is rendered using the lighting characteristic and a corresponding model state of the facial rig; generate a plurality of training samples, wherein a training sample includes a computer-generated image and a corresponding model state; and train a regressor using the plurality of training samples, wherein the trained regressor is configured to infer a model state that corresponds to the face of the subject captured in a frame. - View Dependent Claims (17, 18, 19, 20, 21)
-
Specification