Pose-invariant face recognition system and process

US 6,944,319 B1
Filed: 03/27/2000
Issued: 09/13/2005
Est. Priority Date: 09/13/1999
Status: Expired due to Term

First Claim

Patent Images

1. A computer-implemented face recognition process for identifying a person depicted in an input image, comprising using a computer to perform the following process actions:

creating a database of a plurality of model image characterizations, each of which represents the face of a known person that it is desired to identify in the input image as well as the person'"'"'s face pose;

training a neural network ensemble to identify a person and their face pose from a region which has been extracted from said input image and characterized in a manner similar to the plurality of model images, wherein the network ensemble comprises, a first stage having a plurality of classifiers each of which has input and output units and is dedicated to a particular pose range and outputs a measure of the similarity indicative of the similarity between said characterized input image region and each of said model image characterizations associated with the particular pose range of the classifier, and a fusing neural network as its second stage which combines the outputs of the classifiers to generate an output indicative of the person associated with the characterized input image region and the face pose of that person and which has at least enough output units to allow a different output to represent each person it is desired to identify at each of the pose ranges, and wherein training the neural network ensemble comprises, preparing each model image characterization from a model image depicting the face of a known person that it is desired to identify in the input image by, extracting the portion of the model image depicting said face, normalizing the extracted portion of the model image by resizing it to a prescribed scale if not already at the prescribed scale and adjusting the region so that the eye locations of the depicted subject fall within a prescribed area, and cropping the extracted portion of the model image by eliminating unneeded portions of the image not specifically depicting part of the face of the subject to create a model face image, categorizing the model face images by assigning each to one of a set of pose ranges into which its associated face pose falls, and for each pose range, choosing a prescribed number of the model face images of each person being modeled which have been assigned to the selected pose range, concatenating each of the chosen model face images to create a respective dimensional column vector (DCV) for each, computing a covariance matrix from the DCVs, calculating eigenvectors and corresponding eigenvalues from the covariance matrix, ranking the eigenvalues in descending order, identifying a prescribed number of the top eigenvalues, using the eigenvectors corresponding to the identified eigenvalues to form the rows of a basis vector matrix (BVM) for the pose range, and multiplying each DCV by each BVM to produce a set of principal components analysis (PCA) coefficient vectors for each model face image, and for each face recognition neural network, inputting, one at a time, each of the PCA coefficient vectors associated with the pose range of the face recognition neural network into the inputs of the network until the outputs of the network stabilize, initializing the fusing neural network for training, for each DCV, simultaneously inputting the PCA coefficient vectors generated from the DCV into the respective face recognition neural network associated with the vector'"'"'s particular pose range group until all the PCA coefficient vectors of every DCV have been input, and repeating until the outputs of the fusing neural network stabilize, and for each DCV, simultaneously inputting the PCA coefficient vectors generated from the DCV into the respective face recognition neural network associated with the vector'"'"'s particular pose range group and assigning the active output of the fusing neural network as corresponding to the particular person and pose associated with the model image used to create the set of PCA coefficient vectors; and

employing the network ensemble to identify the person associated with the characterized input image region and the face pose of that person.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A face recognition system and process for identifying a person depicted in an input image and their face pose. This system and process entails locating and extracting face regions belonging to known people from a set of model images, and determining the face pose for each of the face regions extracted. All the extracted face regions are preprocessed by normalizing, cropping, categorizing and finally abstracting them. More specifically, the images are normalized and cropped to show only a person'"'"'s face, categorized according to the face pose of the depicted person'"'"'s face by assigning them to one of a series of face pose ranges, and abstracted preferably via an eigenface approach. The preprocessed face images are preferably used to train a neural network ensemble having a first stage made up of a bank of face recognition neural networks each of which is dedicated to a particular pose range, and a second stage constituting a single fusing neural network that is used to combine the outputs from each of the first stage neural networks. Once trained, the input of a face region which has been extracted from an input image and preprocessed (i.e., normalized, cropped and abstracted) will cause just one of the output units of the fusing portion of the neural network ensemble to become active. The active output unit indicates either the identify of the person whose face was extracted from the input image and the associated face pose, or that the identity of the person is unknown to the system.

142 Citations

9 Claims

1. A computer-implemented face recognition process for identifying a person depicted in an input image, comprising using a computer to perform the following process actions:
- creating a database of a plurality of model image characterizations, each of which represents the face of a known person that it is desired to identify in the input image as well as the person'"'"'s face pose;
  
  training a neural network ensemble to identify a person and their face pose from a region which has been extracted from said input image and characterized in a manner similar to the plurality of model images, wherein the network ensemble comprises, a first stage having a plurality of classifiers each of which has input and output units and is dedicated to a particular pose range and outputs a measure of the similarity indicative of the similarity between said characterized input image region and each of said model image characterizations associated with the particular pose range of the classifier, and a fusing neural network as its second stage which combines the outputs of the classifiers to generate an output indicative of the person associated with the characterized input image region and the face pose of that person and which has at least enough output units to allow a different output to represent each person it is desired to identify at each of the pose ranges, and wherein training the neural network ensemble comprises, preparing each model image characterization from a model image depicting the face of a known person that it is desired to identify in the input image by, extracting the portion of the model image depicting said face, normalizing the extracted portion of the model image by resizing it to a prescribed scale if not already at the prescribed scale and adjusting the region so that the eye locations of the depicted subject fall within a prescribed area, and cropping the extracted portion of the model image by eliminating unneeded portions of the image not specifically depicting part of the face of the subject to create a model face image, categorizing the model face images by assigning each to one of a set of pose ranges into which its associated face pose falls, and for each pose range, choosing a prescribed number of the model face images of each person being modeled which have been assigned to the selected pose range, concatenating each of the chosen model face images to create a respective dimensional column vector (DCV) for each, computing a covariance matrix from the DCVs, calculating eigenvectors and corresponding eigenvalues from the covariance matrix, ranking the eigenvalues in descending order, identifying a prescribed number of the top eigenvalues, using the eigenvectors corresponding to the identified eigenvalues to form the rows of a basis vector matrix (BVM) for the pose range, and multiplying each DCV by each BVM to produce a set of principal components analysis (PCA) coefficient vectors for each model face image, and for each face recognition neural network, inputting, one at a time, each of the PCA coefficient vectors associated with the pose range of the face recognition neural network into the inputs of the network until the outputs of the network stabilize, initializing the fusing neural network for training, for each DCV, simultaneously inputting the PCA coefficient vectors generated from the DCV into the respective face recognition neural network associated with the vector'"'"'s particular pose range group until all the PCA coefficient vectors of every DCV have been input, and repeating until the outputs of the fusing neural network stabilize, and for each DCV, simultaneously inputting the PCA coefficient vectors generated from the DCV into the respective face recognition neural network associated with the vector'"'"'s particular pose range group and assigning the active output of the fusing neural network as corresponding to the particular person and pose associated with the model image used to create the set of PCA coefficient vectors; and
  
  employing the network ensemble to identify the person associated with the characterized input image region and the face pose of that person.
- View Dependent Claims (2)
- - 2. The process of claim 1, wherein the process action of employing the neural network ensemble to identify the person depicted in the input image face region, comprises the actions of:
    - preparing the face region extracted from an input image by normalizing and cropping the extracted regions, wherein said normalizing comprises resizing the extracted face region to the same prescribed scale if not already at the prescribed scale and adjusting the region so that the eye locations of the depicted subject fall within a prescribed area, and wherein the cropping comprises eliminating unneeded portions of the image not specifically depicting part of the face of the subject;
      
      concatenating the prepared face region to create a DCV;
      
      multiplying the DCV by each BVM to produce a set of PCA coefficient vectors for the extracted face region;
      
      inputting each PCA coefficient vector in the set of PCA coefficient vectors into the respective face recognition neural network associated with that vector'"'"'s particular pose range group; and
      
      identifying the active unit of the output of the fusing neural network and designating the person and pose previously assigned to that unit as the person and pose associated with the extracted face region.

3. A face recognition system for identifying a person depicted in an input image, comprising:
- a general purpose computing device; and
  
  a computer program comprising program modules executable by the computing device, wherein the computing device is directed by the program modules of the computer program to, capture model images, each of which depicts at least one person of known identity, locate and extract regions within the model images, each of which depicts the face of a known person that it is desired to identify in the input image, determine the face pose for each of the face regions extracted from the model images, categorize each face region by assigning each to one of a set of pose ranges into which its associated face pose falls, train a neural network ensemble to identify a person and their face pose from a region that depicts the face of a person which has been extracted from said input image, wherein the network ensemble comprises, a first stage having a plurality of classifiers each of which has input and output units and is dedicated to a particular pose range and outputs a measure of the similarity indicative of the similarity between said input image region and each of said model image regions associated with the particular pose range of the classifier, and a fusing neural network as its second stage which combines the outputs of the classifiers to generate an output indicative of the person associated with the characterized input image region and the face pose of that person and which has at least enough output units to allow a different output to represent each person it is desired to identify at each of the pose ranges, and wherein training the neural network ensemble comprises, (a) preparing each face region extracted from said model images by normalizing and cropping the extracted regions, wherein said normalizing comprises resizing each extracted face region to the same prescribed scale if not already at the prescribed scale and adjusting each region so that the eye locations of the depicted subject fall within the same prescribed area, and wherein said cropping comprises eliminating unneeded portions of the image not specifically depicting part of the face of the subject, (b) selecting a previously unselected one of the set of pose ranges, (c) choosing a prescribed number of the prepared face images of each person being modeled which have been assigned to the selected pose range, (d) concatenating each of the chosen prepared face images to create a respective dimensional column vector (DCV) for each, (e) computing a covariance matrix from the DCVs, (f) calculating eigenvectors and corresponding eigenvalues from the covariance matrix, (g) ranking the eigenvalues in descending order, (h) identifying a prescribed number of the top eigenvalues, (i) using the eigenvectors corresponding to the identified eigenvalues to form the rows of a basis vector matrix (BVM) for the selected pose range, (j) repeating actions (b) through (i) for each remaining pose range, (k) multiplying each DCV by each BVM to produce a set of principal components analysis (PCA) coefficient vectors for each face image, (l) for each face recognition neural network, inputting, one at a time, each of the PCA coefficient vectors associated with the pose range of the face recognition neural network into the inputs of the network until the outputs of the network stabilize, (m) initializing the fusing neural network for training, (n) for each DCV, simultaneously inputting the PCA coefficient vectors generated from the DCV into the respective face recognition neural network associated with the vector'"'"'s particular pose range group until all the PCA coefficient vectors of every DCV have been input, and repeating until the outputs of the fusing neural network stabilize, and (o) for each DCV, simultaneously inputting the PCA coefficient vectors generated from the DCV into the respective face recognition neural network associated with the vector'"'"'s particular pose range group and assigning the active output of the fusing neural network as corresponding to the particular person and pose associated with the model image used to create the set of PCA coefficient vectors, and employ the network ensemble to identify the person associated with the characterized input image region and their face pose.
- View Dependent Claims (4)
- - 4. The system of claim 3, wherein the module for employing the neural network ensemble to identify the person depicted in the input image face region and the pose associated with the face of the identified person, comprises sub-modules for:
    - preparing the face region extracted from an input image by normalizing and cropping the extracted regions, wherein said normalizing comprises resizing the extracted face region to the same prescribed scale if not already at the prescribed scale and adjusting the region so that the eye locations of the depicted subject fall within a prescribed area, and wherein the cropping comprises eliminating unneeded portions of the image not specifically depicting part of the face of the subject;
      
      concatenating the prepared face region to create a DCV;
      
      multiplying the DCV by each BVM to produce a set of PCA coefficient vectors for the extracted face region;
      
      inputting each PCA coefficient vector in the set of PCA coefficient vectors into the respective face recognition neural network associated with that vector'"'"'s particular pose range group; and
      
      identifying the active unit of the output of the fusing neural network and designating the person and pose previously assigned to that unit as the person and pose associated with the extracted face region.

5. A computer-readable memory for use in identifying a person depicted in an input image, comprising:
- a computer-readable storage medium; and
  
  a computer program comprising program modules stored in the storage medium, wherein the storage medium is so configured by the computer program that it causes a computer to, input model images, each of which depicts at least one person of known identity, locate and extract regions within the model images, each of which depicts the face of a known person that it is desired to identify in the input image, determine the face pose for each of the face regions extracted from the model images, categorize each face region by assigning each to one of a set of pose ranges into which its associated face pose falls, train a neural network ensemble to identify a person and their face pose from a region that depicts the face of a person which has been extracted from said input image, wherein the network ensemble comprises, a first stage having a plurality of classifiers each of which has input and output units and is dedicated to a particular pose range and outputs a measure of the similarity indicative of the similarity between said input image region and each of said model image regions associated with the particular pose range of the classifier, and a fusing neural network as its second stage which combines the outputs of the classifiers to generate an output indicative of the person associated with the characterized input image region and the face pose of that person and which has at least enough output units to allow a different output to represent each person it is desired to identify at each of the pose ranges, and wherein training the neural network ensemble comprises, (a) preparing each face region extracted from said model images by normalizing and cropping the extracted regions, wherein said normalizing comprises resizing each extracted face region to the same prescribed scale if not already at the prescribed scale and adjusting each region so that the eye locations of the depicted subject fall within the same prescribed area, and wherein said cropping comprises eliminating unneeded portions of the image not specifically depicting part of the face of the subject, (b) selecting a previously unselected one of the set of pose ranges, (c) choosing a prescribed number of the prepared face images of each person being modeled which have been assigned to the selected pose range, (d) concatenating each of the chosen prepared face images to create a respective dimensional column vector (DCV) for each, (e) computing a covariance matrix from the DCVs. (f) calculating eigenvectors and corresponding eigenvalues from the covariance matrix, (g) ranking the eigenvalues in descending order, (h) identifying a prescribed number of the top eigenvalues, (i) using the eigenvectors corresponding to the identified eigenvalues to form the rows of a basis vector matrix (BVM) for the selected pose range, (j) repeating actions (b) through (i) for each remaining pose range, (k) multiplying each DCV by each BVM to produce a set of principal components analysis (PCA) coefficient vectors for each face image, (l) for each face recognition neural network, inputting, one at a time, each of the PCA coefficient vectors associated with the pose range of the face recognition neural network into the inputs of the network until the outputs of the network stabilize, (m) initializing the fusing neural network for training, (n) for each DCV, simultaneously inputting the PCA coefficient vectors generated from the DCV into the respective face recognition neural network associated with the vector'"'"'s particular pose range group until all the PCA coefficient vectors of every DCV have been input, and repeating until the outputs of the fusing neural network stabilize, and (o) for each DCV, simultaneously inputting the PCA coefficient vectors generated from the DCV into the respective face recognition neural network associated with the vector'"'"'s particular pose range group and assigning the active output of the fusing neural network as corresponding to the particular person and pose associated with the model image used to create the set of PCA coefficient vectors, and employ the network ensemble to identify the person associated with the characterized input image region and their face pose.
- View Dependent Claims (6)
- - 6. The computer-readable memory of claim 5 wherein the module for employing the neural network ensemble to identify the person depicted in the input image face region and the pose associated with the face of the identified person, comprises sub-modules for:
    - preparing the face region extracted form an input image by normalizing and cropping the extracted regions, wherein said normalizing comprises resizing the extracted face region to the same prescribed scale if not already at the prescribed scale and adjusting the region so that the eye locations of the depicted subject fall within a prescribed area, and wherein the cropping comprises eliminating unneeded portions of the image not specifically depicting part of the face of the subject;
      
      concatenating the prepared face region to create a DCV;
      
      multiplying the DCV by each BVM to produce a set of PCA coefficient vectors for the extracted face region;
      
      inputting each PCA coefficient vector in the set of PCA coefficient vectors into the respective face recognition neural network associated with that vector'"'"'s particular pose range group; and
      
      identifying the active unit of the output of the fusing neural network and designating the person and pose previously assigned to that unit as the person and pose associated with the extracted face region.

7. A computer-implemented face recognition process for identifying a person depicted in an input image, comprising using a computer to perform the following process actions:
- creating a database of a plurality of model image characterizations, each of which represents the face of a known person that it is desired to identify in the input image as well as the person'"'"'s face pose;
  
  training a neural network ensemble to identify a person and their face pose from a region which has been extracted from said input image and characterized in a manner similar to the plurality of model images, wherein the network ensemble comprises, a first stage having a plurality of classifiers each of which has input and output units and is dedicated to a particular pose range and outputs a measure of the similarity indicative of the similarity between said characterized input image region and each of said model image characterizations associated with the particular pose range of the classifier, and a fusing neural network as its second stage which combines the outputs of the classifiers to generate an output indicative of the person associated with the characterized input image region and the face pose of that person, and wherein training the network ensemble comprises, deriving each model image characterization from a set of model images of people where each model image of the same person shows that person at a different face pose, said deriving comprising, extracting the portion of each model image depicting a face, normalizing the extracted portion of each model image by resizing it to a prescribed scale if not already at the prescribed scale and adjusting the region so that the eye locations of the depicted subject fall within a prescribed area, cropping the extracted portion of each model image by eliminating unneeded portions of the image not specifically depicting part of the face of the subject to create a model face image, concatenating each of the model face images to create a respective model dimensional column vector (DCV) for each, categorizing the model DCVs by assigning each to one of a set of pose ranges into which its associated face pose falls, and inputting the model DCV of the each model face image falling in a particular pose range, one at a time, to a pre-selected classifier dedicated to the particular pose range, initializing the fusing neural network for training, simultaneously inputting the respective DCV of each model face image into all classifiers, until the DCV of every model image has been input, and repeating until the outputs of the neural network stabilize, and simultaneously inputting the respective DCV of each model face image into all classifiers, and assigning the active output the neural network as corresponding to the particular person and pose associated with the model image used to create the DCV; and
  
  employing the network ensemble to identify the person associated with the characterized input image region and the face pose of that person.
- View Dependent Claims (8, 9)
- - 8. The process of claim 7, wherein the process action of employing the network ensemble to identify the person depicted in the input image face region, comprises the actions of:
    - preparing the face region extracted from an input image by normalizing and cropping the extracted regions, wherein said normalizing comprises resizing the extracted face region to the same prescribed scale if not already at the prescribed scale and adjusting the region so that the eye locations of the depicted subject fall within a prescribed area, and wherein the cropping comprises eliminating unneeded portions of the image not specifically depicting part of the face of the subject;
      
      concatenating the prepared face region to create a DCV;
      
      inputting the DCV of the face region into all classifiers; and
      
      identifying the active output of the neural network and designating the person previously assigned to that unit as the person associated with the extracted face region.
  - 9. The process of claim 8, further comprising a process action of specifying that the person designated as associated with the extracted face region has the face pose previously assigned to the identified active output.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Zhigu Holdings Limited
Original Assignee
Microsoft Corporation
Inventors
Huang, Fu Jie, Zhang, Hong-Jiang, Chen, Tsuhan
Primary Examiner(s)
Dastouri, Mehrdad
Assistant Examiner(s)
KIBLER, VIRGINIA M

Application Number

US09/536,820
Time in Patent Office

1,996 Days
Field of Search

382/118, 382/155, 382/156, 382/159, 382/170, 382/181, 382/216
US Class Current

382/118
CPC Class Codes

G06F 18/24323 Tree-organised classifiers

G06V 40/172 Classification, e.g. identi...

Pose-invariant face recognition system and process

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

142 Citations

9 Claims

Specification

Use Cases

Quick Links

Others

Pose-invariant face recognition system and process

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

142 Citations

9 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others