Face identification using artificial neural network
First Claim
Patent Images
1. A system for performing automated facial recognition, the system comprising:
- execution hardware including at least one processor core, a data store, and input/output facilities, the execution hardware configured to implement a convolutional neural network including;
a first group of layers configured to accept as a first input an image containing a face, the image having a plurality of input channels and an input pixel quantity, wherein the first group includes a first convolution layer, a first max-pooling layer, and a first parametric rectified linear unit activation function, and wherein the first group is configured to produce an output having a first predefined quantity of channels that is greater than the plurality of input channels by a factor of at least 80, and having a pixel quantity that is more than 4.2 times smaller than the input pixel quantity;
a second group of layers configured to accept as a second input the output of the first group of layers, the second group including a second convolution layer, a second max-pooling layer, and a second parametric rectified linear unit activation function, wherein the second group is configured to produce an output having a second predefined quantity of channels that is greater than the first predefined quantity of channels by a factor of at least 3, and having a pixel quantity that is more than 4.2 times smaller than the pixel quantity of the output of the first group of layers;
a third group of layers configured to accept as a third input the output of the second group of layers, the third group including a third convolution layer, a third max-pooling layer, and a third parametric rectified linear unit activation function, wherein the third group is configured to produce an output having a third predefined quantity of channels that is greater than the second predefined quantity of channels by a factor of at least 2, and having a pixel quantity that is more than 3.5 times smaller than the pixel quantity of the output of the second group of layers; and
a fourth group of layers configured to accept as a fourth input the output of the third group of layers, the fourth group including a fourth convolution layer, and a fourth parametric rectified linear unit activation function, wherein the fourth group is configured to produce an output having a fourth predefined quantity of channels that is greater than the third predefined quantity of channels by a factor of at least 2, and having a pixel quantity that is more than 1.1 times smaller than the pixel quantity of the output of the third group;
a fifth group of layers including a first fully-connected layer that produces an output comprising a feature vector representative of the image; and
wherein the system further comprises a searchable database containing the feature vector, along with a plurality of other feature vectors respectively representing other images containing faces.
2 Assignments
0 Petitions
Accused Products
Abstract
Automated facial recognition is performed by operation of a convolutional neural network including groups of layers in which the first, second, and third groups include a convolution layer, a max-pooling layer, and a parametric rectified linear unit activation function layer. A fourth group of layers includes a convolution layer and a parametric rectified linear unit activation function layer.
-
Citations
17 Claims
-
1. A system for performing automated facial recognition, the system comprising:
-
execution hardware including at least one processor core, a data store, and input/output facilities, the execution hardware configured to implement a convolutional neural network including; a first group of layers configured to accept as a first input an image containing a face, the image having a plurality of input channels and an input pixel quantity, wherein the first group includes a first convolution layer, a first max-pooling layer, and a first parametric rectified linear unit activation function, and wherein the first group is configured to produce an output having a first predefined quantity of channels that is greater than the plurality of input channels by a factor of at least 80, and having a pixel quantity that is more than 4.2 times smaller than the input pixel quantity; a second group of layers configured to accept as a second input the output of the first group of layers, the second group including a second convolution layer, a second max-pooling layer, and a second parametric rectified linear unit activation function, wherein the second group is configured to produce an output having a second predefined quantity of channels that is greater than the first predefined quantity of channels by a factor of at least 3, and having a pixel quantity that is more than 4.2 times smaller than the pixel quantity of the output of the first group of layers; a third group of layers configured to accept as a third input the output of the second group of layers, the third group including a third convolution layer, a third max-pooling layer, and a third parametric rectified linear unit activation function, wherein the third group is configured to produce an output having a third predefined quantity of channels that is greater than the second predefined quantity of channels by a factor of at least 2, and having a pixel quantity that is more than 3.5 times smaller than the pixel quantity of the output of the second group of layers; and a fourth group of layers configured to accept as a fourth input the output of the third group of layers, the fourth group including a fourth convolution layer, and a fourth parametric rectified linear unit activation function, wherein the fourth group is configured to produce an output having a fourth predefined quantity of channels that is greater than the third predefined quantity of channels by a factor of at least 2, and having a pixel quantity that is more than 1.1 times smaller than the pixel quantity of the output of the third group; a fifth group of layers including a first fully-connected layer that produces an output comprising a feature vector representative of the image; and wherein the system further comprises a searchable database containing the feature vector, along with a plurality of other feature vectors respectively representing other images containing faces. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A machine-implemented method for performing automated facial recognition, the method comprising:
-
receiving, by a computing platform, an image containing a face, the image having a plurality of channels and an input pixel quantity; processing the image by a first group of layers of a convolutional neural network, including a convolution layer, a max-pooling layer, and a parametric rectified linear unit activation function layer, executed on the computing platform, to produce a first layer output having a first predefined quantity of channels that is greater than the plurality of input channels by a factor of at least 80, and having a pixel quantity that is more than 4.2 times smaller than the input pixel quantity; processing the first layer output by a second group of layers of the convolutional neural network, including a convolution layer, a max-pooling layer, and a parametric rectified linear unit activation function layer, executed on the computing platform, to produce a second layer output having a second predefined quantity of channels that is greater than the first predefined quantity of channels by a factor of at least 3, and having a pixel quantity that is more than 4.2 times smaller than the pixel quantity of the first layer output; processing the second layer output by a third group of layers of the convolutional neural network, including a convolution layer, a max-pooling layer, and a parametric rectified linear unit activation function layer, executed on the computing platform, to produce a third layer output having a third predefined quantity of channels that is greater than the second predefined quantity of channels by a factor of at least 2, and having a pixel quantity that is more than 3.5 times smaller than the pixel quantity of the second layer output; processing the third layer output by a fourth group of layers of the convolutional neural network, including a convolution layer and a parametric rectified linear unit activation function layer, to produce a fourth layer output having a fourth predefined quantity of channels that is greater than the third predefined quantity of channels by a factor of at least 2, and having a pixel quantity that is more than 1.1 times smaller than the pixel quantity of the third layer output; processing the fourth layer output by a fifth group of layers including a first fully-connected layer, that produces an output comprising a feature vector representative of the image; and storing the feature vector in a searchable database containing a plurality of other feature vectors respectively representing other images containing faces. - View Dependent Claims (12, 13, 14)
-
-
15. A non-transitory machine-readable medium comprising instructions that, when executed on a computing platform, cause the computing platform to execute operations for performing automated facial recognition, the operations comprising:
-
receiving, by a computing platform, an image containing a face, the image having a plurality of channels and an input pixel quantity; processing the image by a first group of layers of a convolutional neural network, including a convolution layer, a max-pooling layer, and a parametric rectified linear unit activation function layer, executed on the computing platform, to produce a first layer output having a first predefined quantity of channels that is greater than the plurality of input channels by a factor of at least 80, and having a pixel quantity that is more than 4.2 times smaller than the input pixel quantity; processing the first layer output by a second group of layers of the convolutional neural network, including a convolution layer, a max-pooling layer, and a parametric rectified linear unit activation function layer, executed on the computing platform, to produce a second layer output having a second predefined quantity of channels that is greater than the first predefined quantity of channels by a factor of at least 3, and having a pixel quantity that is more than 4.2 times smaller than the pixel quantity of the first layer output; processing the second layer output by a third group of layers of the convolutional neural network, including a convolution layer, a max-pooling layer, and a parametric rectified linear unit activation function layer, executed on the computing platform, to produce a third layer output having a third predefined quantity of channels that is greater than the second predefined quantity of channels by a factor of at least 2, and having a pixel quantity that is more than 3.5 times smaller than the pixel quantity of the second layer output; processing the third layer output by a fourth group of layers of the convolutional neural network, including a convolution layer and a parametric rectified linear unit activation function layer, to produce a fourth layer output having a fourth predefined quantity of channels that is greater than the third predefined quantity of channels by a factor of at least 2, and having a pixel quantity that is more than 1.1 times smaller than the pixel quantity of the third layer output; processing the fourth layer output by a fifth group of layers including a first fully-connected layer, that produces an output comprising a feature vector representative of the image; and storing the feature vector in a searchable database containing a plurality of other feature vectors respectively representing other images containing faces. - View Dependent Claims (16, 17)
-
Specification