Face identification using artificial neural network

US 10,083,347 B2
Filed: 07/29/2016
Issued: 09/25/2018
Est. Priority Date: 07/29/2016
Status: Active Grant

First Claim

Patent Images

1. A system for performing automated facial recognition, the system comprising:

execution hardware including at least one processor core, a data store, and input/output facilities, the execution hardware configured to implement a convolutional neural network including;

a first group of layers configured to accept as a first input an image containing a face, the image having a plurality of input channels and an input pixel quantity, wherein the first group includes a first convolution layer, a first max-pooling layer, and a first parametric rectified linear unit activation function, and wherein the first group is configured to produce an output having a first predefined quantity of channels that is greater than the plurality of input channels by a factor of at least 80, and having a pixel quantity that is more than 4.2 times smaller than the input pixel quantity;

a second group of layers configured to accept as a second input the output of the first group of layers, the second group including a second convolution layer, a second max-pooling layer, and a second parametric rectified linear unit activation function, wherein the second group is configured to produce an output having a second predefined quantity of channels that is greater than the first predefined quantity of channels by a factor of at least 3, and having a pixel quantity that is more than 4.2 times smaller than the pixel quantity of the output of the first group of layers;

a third group of layers configured to accept as a third input the output of the second group of layers, the third group including a third convolution layer, a third max-pooling layer, and a third parametric rectified linear unit activation function, wherein the third group is configured to produce an output having a third predefined quantity of channels that is greater than the second predefined quantity of channels by a factor of at least 2, and having a pixel quantity that is more than 3.5 times smaller than the pixel quantity of the output of the second group of layers; and

a fourth group of layers configured to accept as a fourth input the output of the third group of layers, the fourth group including a fourth convolution layer, and a fourth parametric rectified linear unit activation function, wherein the fourth group is configured to produce an output having a fourth predefined quantity of channels that is greater than the third predefined quantity of channels by a factor of at least 2, and having a pixel quantity that is more than 1.1 times smaller than the pixel quantity of the output of the third group;

a fifth group of layers including a first fully-connected layer that produces an output comprising a feature vector representative of the image; and

wherein the system further comprises a searchable database containing the feature vector, along with a plurality of other feature vectors respectively representing other images containing faces.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Automated facial recognition is performed by operation of a convolutional neural network including groups of layers in which the first, second, and third groups include a convolution layer, a max-pooling layer, and a parametric rectified linear unit activation function layer. A fourth group of layers includes a convolution layer and a parametric rectified linear unit activation function layer.

Citations

17 Claims

1. A system for performing automated facial recognition, the system comprising:
- execution hardware including at least one processor core, a data store, and input/output facilities, the execution hardware configured to implement a convolutional neural network including;
  
  a first group of layers configured to accept as a first input an image containing a face, the image having a plurality of input channels and an input pixel quantity, wherein the first group includes a first convolution layer, a first max-pooling layer, and a first parametric rectified linear unit activation function, and wherein the first group is configured to produce an output having a first predefined quantity of channels that is greater than the plurality of input channels by a factor of at least 80, and having a pixel quantity that is more than 4.2 times smaller than the input pixel quantity;
  
  a second group of layers configured to accept as a second input the output of the first group of layers, the second group including a second convolution layer, a second max-pooling layer, and a second parametric rectified linear unit activation function, wherein the second group is configured to produce an output having a second predefined quantity of channels that is greater than the first predefined quantity of channels by a factor of at least 3, and having a pixel quantity that is more than 4.2 times smaller than the pixel quantity of the output of the first group of layers;
  
  a third group of layers configured to accept as a third input the output of the second group of layers, the third group including a third convolution layer, a third max-pooling layer, and a third parametric rectified linear unit activation function, wherein the third group is configured to produce an output having a third predefined quantity of channels that is greater than the second predefined quantity of channels by a factor of at least 2, and having a pixel quantity that is more than 3.5 times smaller than the pixel quantity of the output of the second group of layers; and
  
  a fourth group of layers configured to accept as a fourth input the output of the third group of layers, the fourth group including a fourth convolution layer, and a fourth parametric rectified linear unit activation function, wherein the fourth group is configured to produce an output having a fourth predefined quantity of channels that is greater than the third predefined quantity of channels by a factor of at least 2, and having a pixel quantity that is more than 1.1 times smaller than the pixel quantity of the output of the third group;
  
  a fifth group of layers including a first fully-connected layer that produces an output comprising a feature vector representative of the image; and
  
  wherein the system further comprises a searchable database containing the feature vector, along with a plurality of other feature vectors respectively representing other images containing faces.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The system of claim 1, further comprising:
    - a search index builder configured to create a search index of feature vectors contained in the searchable database.
  - 3. The system of claim 1, further comprising:
    - a search engine configured to form a query in the searchable database for a subject feature vector; and
      
      a comparator configured to produce a measure of similarity between the subject feature vector and at least one reference feature vector returned as a result of the query, and to compare the measure of similarity against a similarity criterion, wherein satisfaction of the similarity criterion is indicative of a recognition of a face.
  - 4. The system of claim 1, wherein the fifth group of layers includes a fifth parametric rectified linear unit activation function, a dropout operation, a second fully-connected layer, and a softmax layer.
  - 5. The system of claim 1, wherein the convolutional neural network is configurable in a training set building configuration, and in a face recognizer configuration, wherein the training set building configuration undergoes training and wherein the face recognizer configuration omits training.
  - 6. The system of claim 5, wherein the training set building configuration of the convolutional neural network is used to generate reference feature vectors for the searchable database.
  - 7. The system of claim 1, wherein:
    - the first convolution layer is configured to produce a 256-channel output;
      
      the second convolution layer is configured to produce a 768-channel output;
      
      the third convolution layer is configured to produce a 1536-channel output; and
      
      the fourth convolution layer is configured to produce a 3072-channel output.
  - 8. The system of claim 1, wherein the input to the first group of layers includes an image cropped around the face.
  - 9. The system of claim 1, wherein:
    - the first convolution layer is configured to apply a 4×
      
      4 filter with a stride of 1 to produce an output of 77×
      
      77 pixels with 256 channels;
      
      the first max-pooling layer is configured to apply a filter that is 2×
      
      2 with a stride of 2 to produce an output of 39 ×
      
      39 pixels with 256 channels;
      
      the second convolution layer is configured to apply a filter of size 3×
      
      3 and a stride of 1 to produce an output that is 37×
      
      37 pixels with 768 channels;
      
      the second max-pooling layer is configured to apply a filter that is 2×
      
      2 with a stride of 2 to produce an output of 19×
      
      19 pixels with 768 channels;
      
      the third convolution layer is configured to apply a filter that is 3×
      
      3 with a stride of 1 to produce an output that is 17×
      
      17 pixels with 1536 channels;
      
      the third max-pooling layer is configured to apply a 2×
      
      2 filter with a stride of 2 to produce an output of 9×
      
      9 pixels with 3072 channels; and
      
      the fourth convolution layer is configured to apply a 2×
      
      2 filter with a stride of 1 to produce an output that is 8×
      
      8 pixels with 3072 channels.
  - 10. The system of claim 1, further comprising:
    - a face detector configured to analyze a captured image for a presence of visual features indicative of a face, and to produce an output that is fed to an input of the convolutional neural network as the image containing the face.

11. A machine-implemented method for performing automated facial recognition, the method comprising:
- receiving, by a computing platform, an image containing a face, the image having a plurality of channels and an input pixel quantity;
  
  processing the image by a first group of layers of a convolutional neural network, including a convolution layer, a max-pooling layer, and a parametric rectified linear unit activation function layer, executed on the computing platform, to produce a first layer output having a first predefined quantity of channels that is greater than the plurality of input channels by a factor of at least 80, and having a pixel quantity that is more than 4.2 times smaller than the input pixel quantity;
  
  processing the first layer output by a second group of layers of the convolutional neural network, including a convolution layer, a max-pooling layer, and a parametric rectified linear unit activation function layer, executed on the computing platform, to produce a second layer output having a second predefined quantity of channels that is greater than the first predefined quantity of channels by a factor of at least 3, and having a pixel quantity that is more than 4.2 times smaller than the pixel quantity of the first layer output;
  
  processing the second layer output by a third group of layers of the convolutional neural network, including a convolution layer, a max-pooling layer, and a parametric rectified linear unit activation function layer, executed on the computing platform, to produce a third layer output having a third predefined quantity of channels that is greater than the second predefined quantity of channels by a factor of at least 2, and having a pixel quantity that is more than 3.5 times smaller than the pixel quantity of the second layer output;
  
  processing the third layer output by a fourth group of layers of the convolutional neural network, including a convolution layer and a parametric rectified linear unit activation function layer, to produce a fourth layer output having a fourth predefined quantity of channels that is greater than the third predefined quantity of channels by a factor of at least 2, and having a pixel quantity that is more than 1.1 times smaller than the pixel quantity of the third layer output;
  
  processing the fourth layer output by a fifth group of layers including a first fully-connected layer, that produces an output comprising a feature vector representative of the image; and
  
  storing the feature vector in a searchable database containing a plurality of other feature vectors respectively representing other images containing faces.
- View Dependent Claims (12, 13, 14)
- - 12. The method of claim 11, further comprising:
    - forming a query for execution in in the searchable database for a subject feature vector; and
      
      producing a measure of similarity between the subject feature vector and at least one reference feature vector returned as a result of execution of the query; and
      
      comparing the measure of similarity against a similarity criterion, wherein satisfaction of the similarity criterion is indicative of a recognition of a face.
  - 13. The method of claim 11, wherein the fifth group of layers includes a fifth parametric rectified linear unit activation function, a dropout operation, a second fully-connected layer, and a softmax layer.
  - 14. The method of claim 11, wherein the convolutional neural network is configurable in a training set building configuration, and in a face recognizer configuration, wherein the training set building configuration undergoes training and wherein the face recognizer configuration omits training.

15. A non-transitory machine-readable medium comprising instructions that, when executed on a computing platform, cause the computing platform to execute operations for performing automated facial recognition, the operations comprising:
- receiving, by a computing platform, an image containing a face, the image having a plurality of channels and an input pixel quantity;
  
  processing the image by a first group of layers of a convolutional neural network, including a convolution layer, a max-pooling layer, and a parametric rectified linear unit activation function layer, executed on the computing platform, to produce a first layer output having a first predefined quantity of channels that is greater than the plurality of input channels by a factor of at least 80, and having a pixel quantity that is more than 4.2 times smaller than the input pixel quantity;
  
  processing the first layer output by a second group of layers of the convolutional neural network, including a convolution layer, a max-pooling layer, and a parametric rectified linear unit activation function layer, executed on the computing platform, to produce a second layer output having a second predefined quantity of channels that is greater than the first predefined quantity of channels by a factor of at least 3, and having a pixel quantity that is more than 4.2 times smaller than the pixel quantity of the first layer output;
  
  processing the second layer output by a third group of layers of the convolutional neural network, including a convolution layer, a max-pooling layer, and a parametric rectified linear unit activation function layer, executed on the computing platform, to produce a third layer output having a third predefined quantity of channels that is greater than the second predefined quantity of channels by a factor of at least 2, and having a pixel quantity that is more than 3.5 times smaller than the pixel quantity of the second layer output;
  
  processing the third layer output by a fourth group of layers of the convolutional neural network, including a convolution layer and a parametric rectified linear unit activation function layer, to produce a fourth layer output having a fourth predefined quantity of channels that is greater than the third predefined quantity of channels by a factor of at least 2, and having a pixel quantity that is more than 1.1 times smaller than the pixel quantity of the third layer output;
  
  processing the fourth layer output by a fifth group of layers including a first fully-connected layer, that produces an output comprising a feature vector representative of the image; and
  
  storing the feature vector in a searchable database containing a plurality of other feature vectors respectively representing other images containing faces.
- View Dependent Claims (16, 17)
- - 16. The non-transitory machine-readable medium of claim 15, the operations further comprising:
    - forming a query for execution in in the searchable database for a subject feature vector; and
      
      producing a measure of similarity between the subject feature vector and at least one reference feature vector returned as a result of execution of the query; and
      
      comparing the measure of similarity against a similarity criterion, wherein satisfaction of the similarity criterion is indicative of a recognition of a face.
  - 17. The non-transitory machine-readable medium of claim 15, wherein the fifth group of layers includes a fifth parametric rectified linear unit activation function, a dropout operation, a second fully-connected layer, and a softmax layer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Ntech Lab LLC
Original Assignee
Ntech Lab LLC
Inventors
Kuharenko, Artem, Ovcharenko, Sergey, Uldin, Alexander
Primary Examiner(s)
DULANEY, KATHLEEN YUAN

Application Number

US15/224,289
Publication Number

US 20180032796A1
Time in Patent Office

788 Days
Field of Search

382118, 382156, 382201, 382207
US Class Current
CPC Class Codes

G06F 16/21   Design, administration or m...

G06F 16/5838   using colour

G06F 18/214   Generating training pattern...

G06F 18/22   Matching criteria, e.g. pro...

G06F 18/24143   Distances to neighbourhood ...

G06T 1/20   Processor architectures; Pr...

G06T 2200/28   involving image processing ...

G06V 10/82   using neural networks

G06V 30/19173   Classification techniques

G06V 40/168   Feature extraction; Face re...

G06V 40/172   Classification, e.g. identi...

Face identification using artificial neural network

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

17 Claims

Specification

Solutions

Use Cases

Quick Links

Face identification using artificial neural network

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

17 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links