Systems and methods for dynamic facial analysis using a recurrent neural network

US 10,373,332 B2
Filed: 12/08/2017
Issued: 08/06/2019
Est. Priority Date: 12/08/2017
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for facial analysis, comprising:

transforming a fully-connected layer of a first neural network into a recurrent layer to produce a recurrent neural network (RNN), wherein, during training, the fully-connected layer learned a first weight matrix, and the recurrent layer uses the first weight matrix to process inputs to the recurrent layer and uses a second weight matrix to process hidden state produced by the recurrent layer for a previous time step;

receiving video data representing a sequence of image frames including at least one head;

extracting spatial features comprising pitch, yaw, and roll angles of the at least one head from the video data by a second neural network; and

processing, by the RNN, the spatial features for two or more image frames in the sequence of image frames to produce head pose estimates for the at least one head.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, computer readable medium, and system are disclosed for dynamic facial analysis. The method includes the steps of receiving video data representing a sequence of image frames including at least one head and extracting, by a neural network, spatial features comprising pitch, yaw, and roll angles of the at least one head from the video data. The method also includes the step of processing, by a recurrent neural network, the spatial features for two or more image frames in the sequence of image frames to produce head pose estimates for the at least one head.

Citations

20 Claims

1. A computer-implemented method for facial analysis, comprising:
- transforming a fully-connected layer of a first neural network into a recurrent layer to produce a recurrent neural network (RNN), wherein, during training, the fully-connected layer learned a first weight matrix, and the recurrent layer uses the first weight matrix to process inputs to the recurrent layer and uses a second weight matrix to process hidden state produced by the recurrent layer for a previous time step;
  
  receiving video data representing a sequence of image frames including at least one head;
  
  extracting spatial features comprising pitch, yaw, and roll angles of the at least one head from the video data by a second neural network; and
  
  processing, by the RNN, the spatial features for two or more image frames in the sequence of image frames to produce head pose estimates for the at least one head.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 19, 20)
- - 2. The method of claim 1, wherein the spatial features are extracted for each image frame in the sequence of image frames.
  - 3. The method of claim 1, wherein the second neural network is trained using a first training dataset and the RNN is trained using a second training dataset.
  - 4. The method of claim 1, wherein the second neural network is a convolutional neural network (CNN).
  - 5. The method of claim 1, wherein the second neural network and the RNN are simultaneously trained using one training dataset.
  - 6. The method of claim 1, wherein the sequence of image frames includes facial landmarks associated with the at least one head and,the neural network extracts additional spatial features from the video data;
    - andthe RNN processes the additional spatial features for the two or more image frames in the sequence of image frames to produce facial landmark tracking data.
  - 7. The method of claim 6, wherein the facial landmark tracking data comprises three-dimensional positions.
  - 8. The method of claim 1, wherein the RNN is a fully connected RNN.
  - 19. The method of claim 1, wherein, during additional training, the second weight matrix is learned by the RNN.
  - 20. The method of claim 5, wherein the training dataset comprises a synthetic head pose dataset comprising video images and ground truth annotations.

9. A facial analysis system, comprising:
- a first neural network configured to;
  
  receive video data representing a sequence of image frames including at least one head;
  
  extract spatial features comprising pitch, yaw, and roll angles of the at least one head from the video data; and
  
  a recurrent neural network (RNN) that is coupled to the neural network and configured to process the spatial features for two or more image frames in the sequence of image frames to produce head pose tracking data for the at least one head, wherein a fully-connected layer of a second neural network is transformed into a recurrent layer to produce the RNN, the recurrent layer using a first weight matrix to process inputs to the recurrent layer and using a second weight matrix to process hidden state produced by the recurrent layer for a previous time step, and the first matrix is learned by the fully-connected layer during training.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
- - 10. The facial analysis system of claim 9, wherein the spatial features are extracted for each image frame in the sequence of image frames.
  - 11. The facial analysis system of claim 9, wherein the first neural network is trained using a first training dataset and the RNN is trained using a second training dataset.
  - 12. The facial analysis system of claim 9, wherein the first neural network is a convolutional neural network (CNN).
  - 13. The facial analysis system of claim 9, wherein the first neural network and the RNN are simultaneously trained using one training dataset.
  - 14. The facial analysis system of claim 9, wherein the sequence of image frames includes facial landmarks associated with the at least one head and,the first neural network is further configured to extract additional spatial features from the video data;
    - andthe RNN is further configured to process the additional spatial features for the two or more image frames in the sequence of image frames to produce facial landmark tracking data.
  - 15. The facial analysis system of claim 14, wherein the facial landmark tracking data comprises three-dimensional positions.
  - 16. The facial analysis system of claim 9, wherein the RNN is a fully connected RNN.
  - 17. The facial analysis system of claim 9, wherein the video data comprises color values.

18. A non-transitory computer-readable media storing computer instructions for facial analysis that, when executed by one or more processors, cause the one or more processors to perform the steps of:
- transforming a fully-connected layer of a first neural network into a recurrent layer to produce a recurrent neural network (RNN), wherein, during training, the fully-connected layer learned a first weight matrix, and the recurrent layer uses the first weight matrix to process inputs to the recurrent layer and uses a second weight matrix to process hidden state produced by the recurrent layer for a previous timestep;
  
  receiving video data representing a sequence of image frames including at least one head;
  
  extracting spatial features comprising pitch, yaw, and roll angles of the at least one head from the video data by a second neural network; and
  
  processing, by the RNN, the spatial features for two or more image frames in the sequence of image frames to produce head pose estimates for the at least one head.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
NVIDIA Corporation
Original Assignee
NVIDIA Corporation
Inventors
Gu, Jinwei, Yang, Xiaodong, De Mello, Shalini, Kautz, Jan
Primary Examiner(s)
Garcia, Santiago

Application Number

US15/836,549
Publication Number

US 20190180469A1
Time in Patent Office

606 Days
Field of Search
US Class Current
CPC Class Codes

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/08   Learning methods

G06N 3/082   modifying the architecture,...

G06T 13/40   of characters, e.g. humans,...

G06T 2207/10016   Video; Image sequence

G06T 2207/20081   Training; Learning

G06T 2207/20084   Artificial neural networks ...

G06T 2207/30201   Face

G06T 2207/30204   Marker

G06T 3/4046   using neural networks

G06T 7/73   using feature-based methods

G06V 10/764   using classification, e.g. ...

G06V 10/82   using neural networks

G06V 40/167   using comparisons between t...

Systems and methods for dynamic facial analysis using a recurrent neural network

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Systems and methods for dynamic facial analysis using a recurrent neural network

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links