Facial landmark localization using coarse-to-fine cascaded neural networks

US 9,400,922 B2
Filed: 05/29/2014
Issued: 07/26/2016
Est. Priority Date: 05/29/2014
Status: Active Grant

First Claim

Patent Images

1. A system for localizing landmarks on face images, the system comprising:

an input for receiving a face image;

an output for presenting landmarks identified by the system; and

a plurality of neural network levels coupled in a cascade from the input to the output;

wherein each neural network level produces an estimate of landmarks that is more refined than an estimate of landmarks of a previous neural network level,wherein the plurality of neural network levels comprise;

at least three cascaded neural network levels for predicting inner points defining landmarks within a face of the face image, the at least three cascaded neural network levels including the following in order from input to output;

a first bounding box estimator that receives the face image as input and produces a first cropped face image as output, the first cropped face image estimating a location of the face within the face image for purposes of estimating inner points,a first initial prediction module that receives the first cropped face image as input and produces a first landmarked face image as output, the first landmarked face image containing an initial prediction of inner points within the face image, andfor each of the landmarks to be predicted, a component refinement module that receives the first landmarked face image as input and produces a landmarked component image as output, the landmarked component image containing a refined estimate of inner points defining the landmark, andtwo cascaded neural network levels for predicting outer points defining a contour of the face of the face image, the two cascaded neural network levels including the following in order from input to output;

a second bounding box estimator that receives the face image as input and produces a second cropped face image as output, the second cropped face image estimating a location of the face within the face image for purposes of estimating outer points, anda second initial prediction module that receives the second cropped face image as input and produces a second landmarked face image as output, the second landmarked face image containing a prediction of outer points within the face image.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention overcomes the limitations of the prior art by performing facial landmark localization in a coarse-to-fine manner with a cascade of neural network levels, and enforcing geometric constraints for each of the neural network levels. In one approach, the neural network levels may be implemented with deep convolutional neural network. One aspect concerns a system for localizing landmarks on face images. The system includes an input for receiving a face image, and an output for presenting landmarks identified by the system. Neural network levels are coupled in a cascade from the input to the output for the system. Each neural network level produces an estimate of landmarks. The estimate of landmarks is more refined than an estimate of landmark of a previous neural network level.

Citations

20 Claims

1. A system for localizing landmarks on face images, the system comprising:
- an input for receiving a face image;
  
  an output for presenting landmarks identified by the system; and
  
  a plurality of neural network levels coupled in a cascade from the input to the output;
  
  wherein each neural network level produces an estimate of landmarks that is more refined than an estimate of landmarks of a previous neural network level,wherein the plurality of neural network levels comprise;
  
  at least three cascaded neural network levels for predicting inner points defining landmarks within a face of the face image, the at least three cascaded neural network levels including the following in order from input to output;
  
  a first bounding box estimator that receives the face image as input and produces a first cropped face image as output, the first cropped face image estimating a location of the face within the face image for purposes of estimating inner points,a first initial prediction module that receives the first cropped face image as input and produces a first landmarked face image as output, the first landmarked face image containing an initial prediction of inner points within the face image, andfor each of the landmarks to be predicted, a component refinement module that receives the first landmarked face image as input and produces a landmarked component image as output, the landmarked component image containing a refined estimate of inner points defining the landmark, andtwo cascaded neural network levels for predicting outer points defining a contour of the face of the face image, the two cascaded neural network levels including the following in order from input to output;
  
  a second bounding box estimator that receives the face image as input and produces a second cropped face image as output, the second cropped face image estimating a location of the face within the face image for purposes of estimating outer points, anda second initial prediction module that receives the second cropped face image as input and produces a second landmarked face image as output, the second landmarked face image containing a prediction of outer points within the face image.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The system of claim 1 wherein, for each neural network level, inputs to the neural network level are a same size or larger than outputs for the neural network level.
  - 3. The system of claim 2 wherein, for each neural network level, an input to the neural network level is a clip of the face image with a first bounding box, an output for the neural network level is a clip of the face image with a second bounding box, and the first bounding box is a same size or larger than the second bounding box.
  - 4. The system of claim 3 wherein the second bounding box is a subset of the first bounding box.
  - 5. The system of claim 1 wherein one of the neural network levels receives a landmarked component image as input and produces a rotated version of the landmarked component image as output.
  - 6. The system of claim 1 wherein the inner points include inner points defining eyes, mouth and nose.
  - 7. The system of claim 1 wherein a number of neural network levels in cascade is different for different landmarks.
  - 8. The system of claim 7 wherein a number of neural network levels in cascade for the inner points is greater than a number of neural network levels in cascade for the outer points.
  - 9. The system of claim 1 wherein the neural network levels are convolutional neural networks that include convolution, non-linearity and down-sampling.
  - 10. The system of claim 9 wherein each neural network level includes a convolutional neural network with at least two convolutional layers.
  - 11. The system of claim 9 wherein the down-sampling is not more than 2×
    - down-sampling.
  - 12. The system of claim 1 wherein the system contains between three to five neural network levels in cascade.

13. A method for localizing landmarks on face images, the method comprising:
- receiving a face image;
  
  producing an estimate of landmarks of a neural network that is more refined than an estimate of landmarks of a previous neural network level in a cascaded neural network, wherein producing the estimate of landmarks comprises;
  
  by at least three cascaded neural network levels of the cascaded neural network, predicting inner points defining landmarks within a face of the face image, predicting the inner points comprising;
  
  by a first bounding box estimator, receiving the face image as input and producing a first cropped face image as output, the first cropped face image estimating a location of the face within the face image for purposes of estimating inner points,by a first initial prediction module, receiving the first cropped face image as input and producing a first landmarked face image as output, the first landmarked face image containing an initial prediction of inner points within the face image, andfor each of the landmarks to be predicted, by a component refinement module, receiving the first landmarked face image as input and producing a landmarked component image as output, the landmarked component image containing a refined estimate of inner points defining the landmark, andby two cascaded neural network levels of the cascaded neural network, predicting outer points defining a contour of the face of the face image, predicting the outer points comprising;
  
  by a second bounding box estimator, receiving the face image as input and producing a second cropped face image as output, the second cropped face image estimating a location of the face within the face image for purposes of estimating outer points, andby a second initial prediction module, receiving the second cropped face image as input and producing a second landmarked face image as output, the second landmarked face image containing a prediction of outer points within the face image; and
  
  presenting landmarks identified based on the estimate of landmarks,wherein the method is performed by one or more processors.
- View Dependent Claims (14, 15, 16)
- - 14. The method of claim 13, further comprising producing a rotated version of the landmarked component image.
  - 15. The method of claim 13, wherein, for each neural network level, inputs to the neural network level are a same size or larger than outputs for the neural network level.
  - 16. The method of claim 13, wherein the neural network levels are convolutional neural networks that include convolution, non-linearity and down-sampling.

17. A non-transitory computer readable medium configured to store program code, the program code comprising instructions for localizing landmarks on face images, the instructions when executed by a processor cause the processor to:
- receive a face image;
  
  produce an estimate of landmarks of a neural network that is more refined than an estimate of landmarks of a previous neural network level in a cascaded neural network, wherein the instructions to produce the estimate of landmarks further comprise instructions when executed by the processor cause the processor to;
  
  by at least three cascaded neural network levels of the cascaded neural network, predict inner points defining landmarks within a face of the face image, wherein the instructions to predict the inner points further comprise instructions when executed by the processor cause the processor to;
  
  by a first bounding box estimator, receive the face image as input and produce a first cropped face image as output, the first cropped face image estimating a location of the face within the face image for purposes of estimating inner points,by a first initial prediction module, receive the first cropped face image as input and produce a first landmarked face image as output, the first landmarked face image containing an initial prediction of inner points within the face image, andfor each of the landmarks to be predicted, by a component refinement module, receive the first landmarked face image as input and produce a landmarked component image as output, the landmarked component image containing a refined estimate of inner points defining the landmarks, andby two cascaded neural network levels of the cascaded neural network, predict outer points defining a contour of the face of the face image, wherein the instructions to predict the outer points further comprise instructions that when executed by the processor cause the processor to;
  
  by a second bounding box estimator, receive the face image as input and produce, at the second bounding box estimator, a second cropped face image as output, the second cropped face image estimating a location of the face within the face image for purposes of estimating outer points, andby a second initial prediction module, receive the second cropped face image as input and produce a second landmarked face image as output, the second landmarked face image containing a prediction of outer points within the face image; and
  
  present landmarks identified based on the estimate of landmarks.
- View Dependent Claims (18, 19, 20)
- - 18. The non-transitory computer readable medium of claim 17, further comprising instructions when executed by the processor cause the processor to produce a rotated version of the landmarked component image.
  - 19. The non-transitory computer readable medium of claim 17, wherein, for each neural network level, inputs to the neural network level are a same size or larger than outputs for the neural network level.
  - 20. The non-transitory computer readable medium of claim 17, wherein the neural network levels are convolutional neural networks that include convolution, non-linearity and down-sampling.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Beijing Kuangshi Technology Co., Ltd.
Original Assignee
Beijing Kuangshi Technology Co., Ltd.
Inventors
Zhou, Erjin, Fan, Haoqiang, Cao, Zhimin, Jiang, Yuning, Yin, Qi
Primary Examiner(s)
Chawan, Sheela C

Application Number

US14/375,674
Publication Number

US 20150347822A1
Time in Patent Office

789 Days
Field of Search

382/118, 382/156, 382/155, 382/168, 382/255, 382/263, 382/264, 382/274, 382/275
US Class Current

1/1
CPC Class Codes

G06T 1/0007   Image acquisition

G06T 2207/10004   Still image; Photographic i...

G06T 2207/20016   Hierarchical, coarse-to-fin...

G06T 2207/20084   Artificial neural networks ...

G06T 2207/30201   Face

G06V 10/454   Integrating the filters int...

G06V 10/82   using neural networks

G06V 40/165   using facial parts and geom...

G06V 40/171   Local features and componen...

Facial landmark localization using coarse-to-fine cascaded neural networks

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Facial landmark localization using coarse-to-fine cascaded neural networks

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links