Pose-invariant face recognition system and process

US 7,127,087 B2
Filed: 11/05/2004
Issued: 10/24/2006
Est. Priority Date: 03/27/2000
Status: Expired due to Term

First Claim

Patent Images

1. A computer-implemented face recognition process for identifying a person depicted in an input image, comprising using a computer to perform the following process actions:

creating a database of a plurality of model image characterizations, each of which represents the face of a known person that it is desired to identify in the input image as well as the person'"'"'s face pose, wherein the person'"'"'s face pose refers to pitch, roll and yaw angles that define the position of the person'"'"'s head;

training a neural network ensemble to identify a person and their face pose from a region which has been extracted from said input image and characterized in a manner similar to the plurality of model images, wherein the network ensemble comprises, a first stage having a plurality of classifiers each of which has input and output units and is dedicated to a particular pose range and outputs a measure of the similarity indicative of the similarity between said characterized input image region and each of said model image characterizations associated with the particular pose range of the classifier, and a fusing neural network as its second stage which combines the outputs of the classifiers to generate an output indicative of the person associated with the characterized input image region and the face pose of that person; and

employing the network ensemble to identify the person associated with the characterized input image region and the face pose of that person.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A face recognition system and process for identifying a person depicted in an input image and their face pose. This system and process entails locating and extracting face regions belonging to known people from a set of model images, and determining the face pose for each of the face regions extracted. All the extracted face regions are preprocessed by normalizing, cropping, categorizing and finally abstracting them. More specifically, the images are normalized and cropped to show only a persons face, categorized according to the face pose of the depicted person'"'"'s face by assigning them to one of a series of face pose ranges, and abstracted preferably via an eigenface approach. The preprocessed face images are preferably used to train a neural network ensemble having a first stage made up of a bank of face recognition neural networks each of which is dedicated to a particular pose range, and a second stage constituting a single fusing neural network that is used to combine the outputs from each of the first stage neural networks. Once trained, the input of a face region which has been extracted from an input image and preprocessed (i.e., normalized, cropped and abstracted) will cause just one of the output units of the fusing portion of the neural network ensemble to become active. The active output unit indicates either the identify of the person whose face was extracted from the input image and the associated face pose, or that the identity of the person is unknown to the system.

Citations

30 Claims

1. A computer-implemented face recognition process for identifying a person depicted in an input image, comprising using a computer to perform the following process actions:
- creating a database of a plurality of model image characterizations, each of which represents the face of a known person that it is desired to identify in the input image as well as the person'"'"'s face pose, wherein the person'"'"'s face pose refers to pitch, roll and yaw angles that define the position of the person'"'"'s head;
  
  training a neural network ensemble to identify a person and their face pose from a region which has been extracted from said input image and characterized in a manner similar to the plurality of model images, wherein the network ensemble comprises, a first stage having a plurality of classifiers each of which has input and output units and is dedicated to a particular pose range and outputs a measure of the similarity indicative of the similarity between said characterized input image region and each of said model image characterizations associated with the particular pose range of the classifier, and a fusing neural network as its second stage which combines the outputs of the classifiers to generate an output indicative of the person associated with the characterized input image region and the face pose of that person; and
  
  employing the network ensemble to identify the person associated with the characterized input image region and the face pose of that person.
- View Dependent Claims (2, 3, 4, 5, 6, 19, 20, 21, 22)
- - 2. The process of claim 1, wherein the process action for training the neural network ensemble comprises an action of preparing each model image characterization from a model image depicting the face of a known person that it is desired to identify in the input image by,extracting the portion of the model image depicting said face,normalizing the extracted portion of the model image by resizing it to a prescribed scale if not already at the prescribed scale and adjusting the region so that the eye locations of the depicted subject fall within a prescribed area, andcropping the extracted portion of the model image by eliminating unneeded portions of the image not specifically depicting part of the face of the subject to create a model face image.
  - 3. The process of claim 2, wherein the process action for training the neural network ensemble further comprises actions for:
    - categorizing the model face images by assigning each to one of a set of pose ranges into which its associated face pose falls;
      
      for each pose range,choosing a prescribed number of the model face images of each person being modeled which have been assigned to the selected pose range,concatenating each of the chosen model face images to create a respective dimensional column vector (DCV) for each,computing a covariance matrix from the DCVs,calculating eigenvectors and corresponding eigenvalues from the covariance matrix,ranking the eigenvalues in descending order,identifying a prescribed number of the top eigenvalues,using the eigenvectors corresponding to the identified eigenvalues to form the rows of a basis vector matrix (BVM) for the pose range; and
      
      multiplying each DCV by each BVM to produce a set of principal components analysis (PCA) coefficient vectors for each model face image.
  - 4. The process of claim 3, wherein the fusing neural network has at least enough output units to allow a different output to represent each person it is desired to identify at each of the pose ranges, and wherein the process action of training the neural network ensemble further comprises the actions of:
    - for each face recognition neural network, inputting, one at a time, each of the PCA coefficient vectors associated with the pose range of the face recognition neural network into the inputs of the network until the outputs of the network stabilize;
      
      initializing the fusing neural network for training;
      
      for each DCV, simultaneously inputting the PCA coefficient vectors generated from the DCV into the respective face recognition neural network associated the vector'"'"'s particular pose range group until all the PCA coefficient vectors of every DCV have been input, and repeating until the outputs of the fusing neural network stabilize; and
      
      for each DCV, simultaneously inputting the PCA coefficient vectors generated from the DCV into the respective face recognition neural network associated the vector'"'"'s particular pose range group and assigning the active output of the fusing neural network as corresponding to the particular person and pose associated with the model image used to create the set of PCA coefficient vectors.
  - 5. The process of claim 4, wherein the process action of employing the neural network ensemble to identify the person depicted in the input image face region, comprises the actions of:
    - preparing the face region extracted from an input image by normalizing and cropping the extracted regions, wherein said normalizing comprises resizing the extracted face region to the same prescribed scale if not already at the prescribed scale and adjusting the region so that the eye locations of the depicted subject fall within a prescribed area, and wherein the cropping comprises eliminated unneeded portions of the image not specifically depicting part of the face of the subject;
      
      concatenating the prepared face region to create a DCV;
      
      multiplying the DCV by each BVM to produce a set of PCA coefficient vectors for the extracted face region;
      
      inputting each PCA coefficient vector in the set of PCA coefficient vectors into the respective face recognition neural network associated that vector'"'"'s particular pose range group; and
      
      identifying the active unit of the output of the fusing neural network and designating the person and pose previously assigned to that unit as the person and pose associated with the extracted face region.
  - 6. The process of claim 1, further comprising process actions for:
    - training the neural network ensemble to identify the person associated with the characterized input image region to be an unknown person if it does not match any of the model image characterization to a prescribed degree; and
      
      employing the neural network ensemble to identify the person associated with the characterized input image region to be an unknown person if it does not match any of the model image characterization to the prescribed degree.
  - 19. The process of claim 1, wherein the process action for training the network ensemble comprises an action of deriving each model image characterization from a set of model images of people, wherein each model image of the same person shows that person at a different face pose, said deriving action comprising:
    - extracting the portion of each model image depicting a face;
      
      normalizing the extracted portion of each model image by resizing it to a prescribed scale if not already at the prescribed scale and adjusting the region so that the eye locations of the depicted subject fall within a prescribed area;
      
      cropping the extracted portion of each model image by eliminating unneeded portions of the image not specifically depicting part of the face of the subject to create a model face image;
      
      concatenating each of the model face images to create a respective model dimensional column vector (DCV) for each,categorizing the model DCVs by assigning each to one of a set of pose ranges into which its associated face pose falls;
      
      inputting the model DCV of the each model face image falling in a particular pose range, one at a time, to a pre-selected classifier dedicated to the particular pose range.
  - 20. The process of claim 19, wherein the process action of training the network ensemble further comprises the actions of:
    - initializing the fusing neural network for training;
      
      simultaneously inputting the respective DCV of each model face image into all classifiers, until the DCV of every model image has been input, and repeating until the outputs of the neural network stabilize; and
      
      simultaneously inputting the respective DCV of each model face image into all classifiers, and assigning the active output the neural network as corresponding to the particular person and pose associated with the model image used to create the DCV.
  - 21. The process of claim 20, wherein the process action of employing the network ensemble to identify the person depicted in the input image face region, comprises the actions of:
    - preparing the face region extracted from an input image by normalizing and cropping the extracted regions, wherein said normalizing comprises resizing the extracted face region to the same prescribed scale if not already at the prescribed scale and adjusting the region so that the eye locations of the depicted subject fall within a prescribed area, and wherein the cropping comprises eliminated unneeded portions of the image not specifically depicting part of the face of the subject;
      
      concatenating the prepared face region to create a DCV;
      
      inputting the DCV of the face region into all classifiers; and
      
      identifying the active output of the neural network and designating the person previously assigned to that unit as the person associated with the extracted face region.
  - 22. The process of claim 21, further comprising a process action of specifying that the person designated as associated with the extracted face region has the face pose previously assigned to the identified active output.

7. A face recognition system for identifying a person depicted in an input image, comprising:
- a general purpose computing device; and
  
  a computer program comprising program modules executable by the computing device, wherein the computing device is directed by the program modules of the computer program to,capture model images, each of which depicts at least one person of known identity,locate and extract regions within the model images, each of which depicts the face of a known person that it is desired to identify in the input image,determine a face pose for each of the face regions extracted from the model images, wherein the face pose refers to pitch, roll and yaw angles that define the position of a person'"'"'s head,categorize each face region by assigning each to one of a set of pose ranges into which its associated face pose falls,train a neural network ensemble to identify a person and their face pose from a region that depicts the face of a person which has been extracted from said input image, wherein the network ensemble comprises, a first stage having a plurality of classifiers each of which has input and output units and is dedicated to a particular pose range and outputs a measure of the similarity indicative of the similarity between said input image region and each of said model image regions associated with the particular pose range of the classifier, and a fusing neural network as its second stage which combines the outputs of the classifiers to generate an output indicative of the person associated with the characterized input image region and the face pose of that person; and
  
  employ the network ensemble to identify the person associated with the characterized input image region and their face pose.
- View Dependent Claims (8, 9, 10, 11, 12)
- - 8. The system of claim 7, wherein the sub-module for training the neural network ensemble comprises a sub-module for preparing each face region extracted from said model images by normalizing and cropping the extracted regions, wherein said normalizing comprises resizing each extracted face region to the same prescribed scale if not already at the prescribed scale and adjusting each region so that the eye locations of the depicted subject fall within the same prescribed area, and wherein said cropping comprises eliminating unneeded portions of the image not specifically depicting part of the face of the subject.
  - 9. The system of claim 8, wherein the sub-module for training the neural network ensemble further comprises sub-modules for:
    - (a) selecting a previously unselected one of the set of pose ranges;
      
      (b) choosing a prescribed number of the prepared face images of each person being modeled which have been assigned to the selected pose range;
      
      (c) concatenating each of the chosen prepared face images to create a respective dimensional column vector (DCV) for each;
      
      (d) computing a covariance matrix from the DCVs;
      
      (e) calculating eigenvectors and corresponding eigenvalues from the covariance matrix;
      
      (f) ranking the eigenvalues in descending order;
      
      (g) identifying a prescribed number of the top eigenvalues;
      
      (h) using the eigenvectors corresponding to the identified eigenvalues to form the rows of a basis vector matrix (BVM) for the selected pose range;
      
      (i) repeating actions (a) through (h) for each remaining pose range;
      
      (j) multiplying each DCV by each BVM to produce a set of principal components analysis (PCA) coefficient vectors for each face image.
  - 10. The system of claim 9, wherein the fusing neural network has at least enough output units to allow a different output to represent each person it is desired to identify at each of the pose ranges, and wherein the sub-module for training the neural network ensemble further comprises sub-modules for:
    - for each face recognition neural network, inputting, one at a time, each of the PCA coefficient vectors associated with the pose range of the face recognition neural network into the inputs of the network until the outputs of the network stabilize;
      
      initializing the fusing neural network for training;
      
      for each DCV, simultaneously inputting the PCA coefficient vectors generated from the DCV into the respective face recognition neural network associated the vector'"'"'s particular pose range group until all the PCA coefficient vectors of every DCV have been input, and repeating until the outputs of the fusing neural network stabilize; and
      
      for each DCV, simultaneously inputting the PCA coefficient vectors generated from the DCV into the respective face recognition neural network associated the vector'"'"'s particular pose range group and assigning the active output of the fusing neural network as corresponding to the particular person and pose associated with the model image used to create the set of PCA coefficient vectors.
  - 11. The system of claim 10, wherein the sub-module for employing the neural network ensemble to identify the person depicted in the input image face region and the pose associated with the face of the identified person, comprises sub-modules for:
    - preparing the face region extracted from an input image by normalizing and cropping the extracted regions, wherein said normalizing comprises resizing the extracted face region to the same prescribed scale if not already at the prescribed scale and adjusting the region so that the eye locations of the depicted subject fall within a prescribed area, and wherein the cropping comprises eliminated unneeded portions of the image not specifically depicting part of the face of the subject;
      
      concatenating the prepared face region to create a DCV;
      
      multiplying the DCV by each BVM to produce a set of PCA coefficient vectors for the extracted face region;
      
      inputting each PCA coefficient vector in the set of PCA coefficient vectors into the respective face recognition neural network associated that vector'"'"'s particular pose range group; and
      
      identifying the active unit of the output of the fusing neural network and designating the person and pose previously assigned to that unit as the person and pose associated with the extracted face region.
  - 12. The system of claim 7, further comprising sub-modules for:
    - training the neural network ensemble to identify the person associated with the input image face region to be an unknown person if it does not match any of the face regions assigned to each pose range to a prescribed degree; and
      
      employing the neural network ensemble to identify the person associated with the input image face region to be an unknown person if it does not match any of the face regions assigned to each pose range to a prescribed degree.

13. A computer-readable memory for use in identifying a person depicted in an input image, comprising:
- a computer-readable storage medium; and
  
  a computer program comprising program modules stored in the storage medium, wherein the storage medium is so configured by the computer program that it causes a computer to,input model images, each of which depicts at least one person of known identity,locate and extract regions within the model images, each of which depicts the face of a known person that it is desired to identify in the input image,determine a face pose for each of the face regions extracted from the model images, wherein the face pose refers to pitch, roll and yaw angles that define the position of a person'"'"'s head,categorize each face region by assigning each to one of a set of pose ranges into which its associated face pose falls,train a neural network ensemble to identify a person and their face pose from a region depicting the face of a person which has been extracted from said input image, wherein the network ensemble comprises, a first stage having a plurality of classifiers each of which has input and output units and is dedicated to a particular pose range and outputs a measure of the similarity indicative of the similarity between said input image region and each of said model image regions associated with the particular pose range of the classifier, and a fusing neural network as its second stage which combines the outputs of the classifiers to generate an output indicative of the person associated with the characterized input image region and the face pose of that person; and
  
  employ the network ensemble to identify the person associated with the characterized input image region and their face pose.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The computer-readable memory of claim 13, wherein the sub-module for training the neural network ensemble comprises a sub-module for preparing each face region extracted from said model images by normalizing and cropping the extracted regions, wherein said normalizing comprises resizing each extracted face region to the same prescribed scale if not already at the prescribed scale and adjusting each region so that the eye locations of the depicted subject fall within the same prescribed area, and wherein said cropping comprises eliminating unneeded portions of the image not specifically depicting part of the face of the subject.
  - 15. The computer-readable memory of claim 14, wherein the sub-module for training the neural network ensemble further comprises sub-modules for:
    - (a) selecting a previously unselected one of the set of pose ranges;
      
      (b) choosing a prescribed number of the prepared face images of each person being modeled which have been assigned to the selected pose range;
      
      (c) concatenating each of the chosen prepared face images to create a respective dimensional column vector (DCV) for each;
      
      (d) computing a covariance matrix from the DCVs;
      
      (e) calculating eigenvectors and corresponding eigenvalues from the covariance matrix;
      
      (f) ranking the eigenvalues in descending order;
      
      (g) identifying a prescribed number of the top eigenvalues;
      
      (h) using the eigenvectors corresponding to the identified eigenvalues to form the rows of a basis vector matrix (BVM) for the selected pose range;
      
      (i) repeating actions (a) through (h) for each remaining pose range;
      
      (j) multiplying each DCV by each BVM to produce a set of principal components analysis (PCA) coefficient vectors for each face image.
  - 16. The computer-readable memory of claim 15, wherein the fusing neural network has at least enough output units to allow a different output to represent each person it is desired to identify at each of the pose ranges, and wherein the sub-module for training the neural network ensemble further comprises sub-modules for:
    - for each face recognition neural network, inputting, one at a time, each of the PCA coefficient vectors associated with the pose range of the face recognition neural network into the inputs of the network until the outputs of the network stabilize;
      
      initializing the fusing neural network for training;
      
      for each DCV, simultaneously inputting the PCA coefficient vectors generated from the DCV into the respective face recognition neural network associated the vector'"'"'s particular pose range group until all the PCA coefficient vectors of every DCV have been input, and repeating until the outputs of the fusing neural network stabilize; and
      
      for each DCV, simultaneously inputting the PCA coefficient vectors generated from the DCV into the respective face recognition neural network associated the vector'"'"'s particular pose range group and assigning the active output of the fusing neural network as corresponding to the particular person and pose associated with the model image used to create the set of PCA coefficient vectors.
  - 17. The computer-readable memory of claim 16, wherein the sub-module for employing the neural network ensemble to identify the person depicted in the input image face region and the pose associated with the face of the identified person, comprises sub-modules for:
    - preparing the face region extracted from an input image by normalizing and cropping the extracted regions, wherein said normalizing comprises resizing the extracted face region to the same prescribed scale if not already at the prescribed scale and adjusting the region so that the eye locations of the depicted subject fall within a prescribed area, and wherein the cropping comprises eliminated unneeded portions of the image not specifically depicting part of the face of the subject;
      
      concatenating the prepared face region to create a DCV;
      
      multiplying the DCV by each BVM to produce a set of PCA coefficient vectors for the extracted face region;
      
      inputting each PCA coefficient vector in the set of PCA coefficient vectors into the respective face recognition neural network associated that vector'"'"'s particular pose range group; and
      
      identifying the active unit of the output of the fusing neural network and designating the person and pose previously assigned to that unit as the person and pose associated with the extracted face region.
  - 18. The computer-readable memory of claim 13, further comprising sub-modules for:
    - training a neural network ensemble to identify the person associated with the input image face region to be an unknown person if it does not match any of the face regions assigned to each pose range to a prescribed degree; and
      
      employing the neural network ensemble to identify the person associated with the input image face region to be an unknown person if it does not match any of the face regions assigned to each pose range to a prescribed degree.

23. A face recognition neural network ensemble for identifying a person depicted in an input image and a face pose range among a set of pose ranges into which the face of each identified person falls, wherein a face pose range refers to ranges of pitch, roll and yaw angles that define the position of a person'"'"'s head, said ensemble comprising:
- a plurality of face recognition neural networks each of which has input and output units and each of which is dedicated to a particular pose range; and
  
  a fusing neural network whose inputs are in communication with the output units of said face recognition neural networks and which has at least enough output units to allow a different output to represent each person it is desired to identify at each of the pose ranges; and
  
  whereinimage feature characterizations derived from the face of a person it is desired to identify as depicted in the input image are respectively input into separate ones of the input units of the face recognition neural networks causing a single one of the output units of the fusing neural network to become active, thereby indicating the identity of the person whose face was depicted in the input image as well as the pose range associated with the pose of the depicted face.
- View Dependent Claims (24, 25, 26, 27, 28, 29)
- - 24. The neural network ensemble of claim 23, wherein the number of input units of each face recognition neural network equals the number of image feature characterizations derived from the face of the person to be identified as depicted in the input image.
  - 25. The neural network ensemble of claim 23, wherein the number of output units of each face recognition neural network at least equals the number of different people it is desired to identify in the input image.
  - 26. The neural network ensemble of claim 23, wherein the output units of each face recognition neural network output real values ranging from 0 to 1.
  - 27. The neural network ensemble of claim 23, wherein the number of input units of the fusing network is equal to the number of face recognition neural networks multiplied by the number of output units of any one of the face recognition neural networks, and wherein the number of output units of the fusing network is equal to the number of its input units.
  - 28. The neural network ensemble of claim 23, wherein the output units of each face recognition neural network are binary in that a particular output unit is active whenever it has the largest output amongst all the output units thereby representing a 1, and otherwise inactive thereby representing a 0.
  - 29. The neural network ensemble of claim 25, wherein there is one output unit of each face recognition neural network in addition to the number required to equal the number of different people it is desired to identify in the input image, and wherein at least one of the output units of the fusing neural network represents a person of unknown identity and unknown face pose.

30. A face recognition network ensemble for identifying a person depicted in an input image and a face pose range among a set of pose ranges into which the face of each identified person falls, wherein a face pose range refers to ranges of pitch, roll and yaw angles that define the position of a person'"'"'s head, said ensemble comprising:
- a plurality of classifiers each of which has input and output units and each of which is dedicated to a particular pose range; and
  
  a fusing neural network whose inputs are in communication with the output units of said classifiers and which has at least enough output units to allow a different output to represent each person it is desired to identify at each of the pose ranges; and
  
  whereinimage feature characterizations derived from the face of a person it is desired to identify as depicted in the input image are respectively input into separate ones of the input units of the face recognition classifiers causing a single one of the output units of the fusing neural network to become active, thereby indicating the identity of the person whose face was depicted in the input image as well as the pose range associated with the pose of the depicted face.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Zhigu Holdings Limited
Original Assignee
Microsoft Corporation
Inventors
Chen, Tsuhan, Zhang, Hong-Jiang, Huang, Fu Jie
Primary Examiner(s)
Wu, Jingge
Assistant Examiner(s)
CARTER, AARON W

Application Number

US10/983,194
Publication Number

US 20050147292A1
Time in Patent Office

718 Days
Field of Search

382/118, 382/155, 382/156, 382/159, 382/170, 382/181, 382/203, 382/216, 340/5.52, 340/5.53, 902/3, 713/186
US Class Current

382/118
CPC Class Codes

G06F 18/254   of classification results, ...

G06V 40/161   Detection; Localisation; No...

G06V 40/172   Classification, e.g. identi...

Pose-invariant face recognition system and process

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

30 Claims

Specification

Solutions

Use Cases

Quick Links

Pose-invariant face recognition system and process

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

30 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links