Synergistic face detection and pose estimation with energy-based models

US 20060034495A1
Filed: 03/31/2005
Published: 02/16/2006
Est. Priority Date: 04/21/2004
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method of face detection and pose estimation, the method comprising the following steps:

training, a convolutional neural network to map facial images to points on a face manifold, parameterized by facial pose, and to map non-facial images to points away from the face manifold; and

simultaneously determining, whether an image is a face from its proximity to the face manifold and an estimate of facial pose of that image from its projection to the face manifold.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for human face detection that detects faces independently of their particular poses and simultaneously estimates those poses. Our method exhibits an immunity to variations in skin color, eyeglasses, facial hair, lighting, scale and facial expressions, and others. In operation, we train a convolutional neural network to map face images to points on a face manifold, and non-face images to points far away from that manifold, wherein that manifold is parameterized by facial pose. Conceptually, we view a pose parameter as a latent variable, which may be inferred through an energy-minimization process. To train systems based upon our inventive method, we derive a new type of discriminative loss function that is tailored to such detection tasks. Our method enables a multi-view detector that can detect faces in a variety of poses, for example, looking left or right (yaw axis), up or down (pitch axis), or tilting left or right (roll axis). Systems employing our method are highly-reliable, run at near real time (5 frames per second on conventional hardware), and is robust against variations in yaw (±90°), roll (±45°), and pitch (±60°).

34 Citations

View as Search Results

7 Claims

1. A computer-implemented method of face detection and pose estimation, the method comprising the following steps:
- training, a convolutional neural network to map facial images to points on a face manifold, parameterized by facial pose, and to map non-facial images to points away from the face manifold; and
  
  simultaneously determining, whether an image is a face from its proximity to the face manifold and an estimate of facial pose of that image from its projection to the face manifold.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the training step further comprises the step(s) of:
    - optimizing a loss function of three variables, wherein said variables include image, pose, and face/non-face characteristics of an image.
  - 3. The method of claim 2, wherein the loss function is represented by:
    - $Loss (W) = \frac{1}{\langle S_{1} \rangle} \sum_{i \in S_{1}} L_{1} (W, Z^{i}, X^{i}) + \frac{1}{\langle S_{0} \rangle} \sum_{i \in S_{1}} L_{0} (W, X^{i});$ where S₁is the set of training faces, S₀is the set of non-faces, L₁(W,Zⁱ,Xⁱ) and L₀(W,Xⁱ) are loss functions for a face sample (with a known pose) and non-face sample, respectively.
  - 4. The method of claim 1, wherein said determination step comprises the step(s) of:
    - clamping X to an observed value (the image), and finding the values of Z and Y that minimize an energy function E_W(Y,Z,X) according to the following relationship;
      
      ({overscore (Y)},{overscore (Z)})=arg min_{Yε
      
      {Y},Zε
      
      {Z}}E_W(Y,Z,X) where {Y}={0, 1} and {Z}=[−
      
      90, 90]×
      
      [−
      
      45, 45] for yaw and roll variables.
  - 5. The method of claim 4, wherein the energy function for a face E_W(1,Z,X), is the distance between the point produced by the network G_W(X) and the point with pose Z on the manifold F(Z) according to the following relationship:
    - E_W(1,Z,X)=∥
      
      G_W(X)−
      
      F(Z)∥
      
      .
  - 6. The method of claim 5 wherein the energy function for a non-face E_W(0,Z,X), is equal to a threshold constant T, independent of Z and X that may be represented by the following relationship:
    - E_W(Y,Z,X)=Y∥
      
      G_W(X)−
      
      F(Z)∥
      
      +(1−
      
      Y)T.
  - 7. The method of claim 1 wherein the determination step comprises:
    - finding an output label and pose having the smallest energy according to the following relationship;
      
      {overscore (Z)}=arg min_Zε
      
      {Z}∥
      
      G_W(X)−
      
      F(Z)∥
      
      +)1−
      
      Y)T;
      
      comparing this minimum distance, ∥
      
      G_W(X)−
      
      F({overscore (Z)})∥
      
      , to threshold T;
      
      classifying X as a face if the minimum distance is smaller than T, otherwise classifying X as a non-face.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NEC Corporation
Original Assignee
Margarita Osadchy, Matthew L. Miller, Yann Lecun
Inventors
Miller, Matthew L., Osadchy, Margarita, LeCun, Yann

Granted Patent

US 7,236,615 B2
Time in Patent Office

Days
Field of Search
US Class Current

382/118
CPC Class Codes

G06V 10/454   Integrating the filters int...

G06V 20/653   by matching three-dimension...

G06V 40/165   using facial parts and geom...

Synergistic face detection and pose estimation with energy-based models

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

34 Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

Synergistic face detection and pose estimation with energy-based models

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

34 Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links