Systems and methods for dynamic facial analysis using a recurrent neural network
First Claim
Patent Images
1. A computer-implemented method for facial analysis, comprising:
- transforming a fully-connected layer of a first neural network into a recurrent layer to produce a recurrent neural network (RNN), wherein, during training, the fully-connected layer learned a first weight matrix, and the recurrent layer uses the first weight matrix to process inputs to the recurrent layer and uses a second weight matrix to process hidden state produced by the recurrent layer for a previous time step;
receiving video data representing a sequence of image frames including at least one head;
extracting spatial features comprising pitch, yaw, and roll angles of the at least one head from the video data by a second neural network; and
processing, by the RNN, the spatial features for two or more image frames in the sequence of image frames to produce head pose estimates for the at least one head.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, computer readable medium, and system are disclosed for dynamic facial analysis. The method includes the steps of receiving video data representing a sequence of image frames including at least one head and extracting, by a neural network, spatial features comprising pitch, yaw, and roll angles of the at least one head from the video data. The method also includes the step of processing, by a recurrent neural network, the spatial features for two or more image frames in the sequence of image frames to produce head pose estimates for the at least one head.
-
Citations
20 Claims
-
1. A computer-implemented method for facial analysis, comprising:
-
transforming a fully-connected layer of a first neural network into a recurrent layer to produce a recurrent neural network (RNN), wherein, during training, the fully-connected layer learned a first weight matrix, and the recurrent layer uses the first weight matrix to process inputs to the recurrent layer and uses a second weight matrix to process hidden state produced by the recurrent layer for a previous time step; receiving video data representing a sequence of image frames including at least one head; extracting spatial features comprising pitch, yaw, and roll angles of the at least one head from the video data by a second neural network; and processing, by the RNN, the spatial features for two or more image frames in the sequence of image frames to produce head pose estimates for the at least one head. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 19, 20)
-
-
9. A facial analysis system, comprising:
-
a first neural network configured to; receive video data representing a sequence of image frames including at least one head; extract spatial features comprising pitch, yaw, and roll angles of the at least one head from the video data; and a recurrent neural network (RNN) that is coupled to the neural network and configured to process the spatial features for two or more image frames in the sequence of image frames to produce head pose tracking data for the at least one head, wherein a fully-connected layer of a second neural network is transformed into a recurrent layer to produce the RNN, the recurrent layer using a first weight matrix to process inputs to the recurrent layer and using a second weight matrix to process hidden state produced by the recurrent layer for a previous time step, and the first matrix is learned by the fully-connected layer during training. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A non-transitory computer-readable media storing computer instructions for facial analysis that, when executed by one or more processors, cause the one or more processors to perform the steps of:
-
transforming a fully-connected layer of a first neural network into a recurrent layer to produce a recurrent neural network (RNN), wherein, during training, the fully-connected layer learned a first weight matrix, and the recurrent layer uses the first weight matrix to process inputs to the recurrent layer and uses a second weight matrix to process hidden state produced by the recurrent layer for a previous timestep; receiving video data representing a sequence of image frames including at least one head; extracting spatial features comprising pitch, yaw, and roll angles of the at least one head from the video data by a second neural network; and processing, by the RNN, the spatial features for two or more image frames in the sequence of image frames to produce head pose estimates for the at least one head.
-
Specification