Facial landmark localization using coarse-to-fine cascaded neural networks
First Claim
1. A system for localizing landmarks on face images, the system comprising:
- an input for receiving a face image;
an output for presenting landmarks identified by the system; and
a plurality of neural network levels coupled in a cascade from the input to the output;
wherein each neural network level produces an estimate of landmarks that is more refined than an estimate of landmarks of a previous neural network level,wherein the plurality of neural network levels comprise;
at least three cascaded neural network levels for predicting inner points defining landmarks within a face of the face image, the at least three cascaded neural network levels including the following in order from input to output;
a first bounding box estimator that receives the face image as input and produces a first cropped face image as output, the first cropped face image estimating a location of the face within the face image for purposes of estimating inner points,a first initial prediction module that receives the first cropped face image as input and produces a first landmarked face image as output, the first landmarked face image containing an initial prediction of inner points within the face image, andfor each of the landmarks to be predicted, a component refinement module that receives the first landmarked face image as input and produces a landmarked component image as output, the landmarked component image containing a refined estimate of inner points defining the landmark, andtwo cascaded neural network levels for predicting outer points defining a contour of the face of the face image, the two cascaded neural network levels including the following in order from input to output;
a second bounding box estimator that receives the face image as input and produces a second cropped face image as output, the second cropped face image estimating a location of the face within the face image for purposes of estimating outer points, anda second initial prediction module that receives the second cropped face image as input and produces a second landmarked face image as output, the second landmarked face image containing a prediction of outer points within the face image.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention overcomes the limitations of the prior art by performing facial landmark localization in a coarse-to-fine manner with a cascade of neural network levels, and enforcing geometric constraints for each of the neural network levels. In one approach, the neural network levels may be implemented with deep convolutional neural network. One aspect concerns a system for localizing landmarks on face images. The system includes an input for receiving a face image, and an output for presenting landmarks identified by the system. Neural network levels are coupled in a cascade from the input to the output for the system. Each neural network level produces an estimate of landmarks. The estimate of landmarks is more refined than an estimate of landmark of a previous neural network level.
-
Citations
20 Claims
-
1. A system for localizing landmarks on face images, the system comprising:
-
an input for receiving a face image; an output for presenting landmarks identified by the system; and a plurality of neural network levels coupled in a cascade from the input to the output;
wherein each neural network level produces an estimate of landmarks that is more refined than an estimate of landmarks of a previous neural network level,wherein the plurality of neural network levels comprise; at least three cascaded neural network levels for predicting inner points defining landmarks within a face of the face image, the at least three cascaded neural network levels including the following in order from input to output; a first bounding box estimator that receives the face image as input and produces a first cropped face image as output, the first cropped face image estimating a location of the face within the face image for purposes of estimating inner points, a first initial prediction module that receives the first cropped face image as input and produces a first landmarked face image as output, the first landmarked face image containing an initial prediction of inner points within the face image, and for each of the landmarks to be predicted, a component refinement module that receives the first landmarked face image as input and produces a landmarked component image as output, the landmarked component image containing a refined estimate of inner points defining the landmark, and two cascaded neural network levels for predicting outer points defining a contour of the face of the face image, the two cascaded neural network levels including the following in order from input to output; a second bounding box estimator that receives the face image as input and produces a second cropped face image as output, the second cropped face image estimating a location of the face within the face image for purposes of estimating outer points, and a second initial prediction module that receives the second cropped face image as input and produces a second landmarked face image as output, the second landmarked face image containing a prediction of outer points within the face image. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for localizing landmarks on face images, the method comprising:
-
receiving a face image; producing an estimate of landmarks of a neural network that is more refined than an estimate of landmarks of a previous neural network level in a cascaded neural network, wherein producing the estimate of landmarks comprises; by at least three cascaded neural network levels of the cascaded neural network, predicting inner points defining landmarks within a face of the face image, predicting the inner points comprising; by a first bounding box estimator, receiving the face image as input and producing a first cropped face image as output, the first cropped face image estimating a location of the face within the face image for purposes of estimating inner points, by a first initial prediction module, receiving the first cropped face image as input and producing a first landmarked face image as output, the first landmarked face image containing an initial prediction of inner points within the face image, and for each of the landmarks to be predicted, by a component refinement module, receiving the first landmarked face image as input and producing a landmarked component image as output, the landmarked component image containing a refined estimate of inner points defining the landmark, and by two cascaded neural network levels of the cascaded neural network, predicting outer points defining a contour of the face of the face image, predicting the outer points comprising; by a second bounding box estimator, receiving the face image as input and producing a second cropped face image as output, the second cropped face image estimating a location of the face within the face image for purposes of estimating outer points, and by a second initial prediction module, receiving the second cropped face image as input and producing a second landmarked face image as output, the second landmarked face image containing a prediction of outer points within the face image; and presenting landmarks identified based on the estimate of landmarks, wherein the method is performed by one or more processors. - View Dependent Claims (14, 15, 16)
-
-
17. A non-transitory computer readable medium configured to store program code, the program code comprising instructions for localizing landmarks on face images, the instructions when executed by a processor cause the processor to:
-
receive a face image; produce an estimate of landmarks of a neural network that is more refined than an estimate of landmarks of a previous neural network level in a cascaded neural network, wherein the instructions to produce the estimate of landmarks further comprise instructions when executed by the processor cause the processor to; by at least three cascaded neural network levels of the cascaded neural network, predict inner points defining landmarks within a face of the face image, wherein the instructions to predict the inner points further comprise instructions when executed by the processor cause the processor to; by a first bounding box estimator, receive the face image as input and produce a first cropped face image as output, the first cropped face image estimating a location of the face within the face image for purposes of estimating inner points, by a first initial prediction module, receive the first cropped face image as input and produce a first landmarked face image as output, the first landmarked face image containing an initial prediction of inner points within the face image, and for each of the landmarks to be predicted, by a component refinement module, receive the first landmarked face image as input and produce a landmarked component image as output, the landmarked component image containing a refined estimate of inner points defining the landmarks, and by two cascaded neural network levels of the cascaded neural network, predict outer points defining a contour of the face of the face image, wherein the instructions to predict the outer points further comprise instructions that when executed by the processor cause the processor to; by a second bounding box estimator, receive the face image as input and produce, at the second bounding box estimator, a second cropped face image as output, the second cropped face image estimating a location of the face within the face image for purposes of estimating outer points, and by a second initial prediction module, receive the second cropped face image as input and produce a second landmarked face image as output, the second landmarked face image containing a prediction of outer points within the face image; and present landmarks identified based on the estimate of landmarks. - View Dependent Claims (18, 19, 20)
-
Specification