Compressed recurrent neural network models
First Claim
1. A system comprising:
- data processing hardware; and
memory hardware in communication with the data processing hardware and storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising;
training an uncompressed version of a recurrent neural network (RNN) on training data to learn a respective recurrent weight matrix, Wh, and a respective inter-layer weight matrix, Wx, for each of a plurality of uncompressed recurrent layers of the uncompressed version of the RNN, each recurrent layer of the plurality of uncompressed recurrent layers configured to, for each of a plurality of time steps;
receive a respective layer input for the time step; and
process the respective layer input for the time step to generate a respective layer output for the time step;
re-configuring the trained RNN by, for at least one recurrent layer of the plurality of uncompressed recurrent layers of the uncompressed version of the trained RNN, compressing the recurrent layer by;
determining a respective singular value decomposition (SVD) of the respective recurrent weight matrix, Wh, for the recurrent layer;
generating a first compressed weight matrix, Zhl, and a projection matrix, Pl, based on the respective SVD of the respective recurrent weight matrix, Wh, for the recurrent layer;
generating a second compressed weight matrix, Zxl, based on the first compressed weight matrix, Zhl, and the projection matrix, Pl;
replacing the respective recurrent weight matrix, Wh, with the product of the first compressed weight matrix, Zhl, and the projection matrix, Pl; and
replacing the respective inter-layer weight matrix, Wx, with the product of the second compressed weight matrix, Zxl, and the projection matrix, Pl; and
transmitting the re-configured trained RNN having the at least one compressed recurrent layer to a mobile device in communication with the data processing hardware, the re-configured trained RNN having the at least one compressed recurrent layer configured to receive a respective neural network input at each of multiple time steps and generate a respective neural network output at each of the multiple time steps.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for implementing a compressed recurrent neural network (RNN). One of the systems includes a compressed RNN, the compressed RNN comprising a plurality of recurrent layers, wherein each of the recurrent layers has a respective recurrent weight matrix and a respective inter-layer weight matrix, and wherein at least one of recurrent layers is compressed such that a respective recurrent weight matrix of the compressed layer is defined by a first compressed weight matrix and a projection matrix and a respective inter-layer weight matrix of the compressed layer is defined by a second compressed weight matrix and the projection matrix.
17 Citations
15 Claims
-
1. A system comprising:
-
data processing hardware; and memory hardware in communication with the data processing hardware and storing instructions that when executed on the data processing hardware cause the data processing hardware to perform operations comprising; training an uncompressed version of a recurrent neural network (RNN) on training data to learn a respective recurrent weight matrix, Wh, and a respective inter-layer weight matrix, Wx, for each of a plurality of uncompressed recurrent layers of the uncompressed version of the RNN, each recurrent layer of the plurality of uncompressed recurrent layers configured to, for each of a plurality of time steps; receive a respective layer input for the time step; and process the respective layer input for the time step to generate a respective layer output for the time step; re-configuring the trained RNN by, for at least one recurrent layer of the plurality of uncompressed recurrent layers of the uncompressed version of the trained RNN, compressing the recurrent layer by; determining a respective singular value decomposition (SVD) of the respective recurrent weight matrix, Wh, for the recurrent layer; generating a first compressed weight matrix, Zhl, and a projection matrix, Pl, based on the respective SVD of the respective recurrent weight matrix, Wh, for the recurrent layer; generating a second compressed weight matrix, Zxl, based on the first compressed weight matrix, Zhl, and the projection matrix, Pl; replacing the respective recurrent weight matrix, Wh, with the product of the first compressed weight matrix, Zhl, and the projection matrix, Pl; and replacing the respective inter-layer weight matrix, Wx, with the product of the second compressed weight matrix, Zxl, and the projection matrix, Pl; and transmitting the re-configured trained RNN having the at least one compressed recurrent layer to a mobile device in communication with the data processing hardware, the re-configured trained RNN having the at least one compressed recurrent layer configured to receive a respective neural network input at each of multiple time steps and generate a respective neural network output at each of the multiple time steps. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for compressing a recurrent neural network (RNN), the method comprising:
-
training, by data processing hardware, an uncompressed version of a recurrent neural network (RNN) on training data to learn a respective recurrent weight matrix, Wh, and a respective inter-layer weight matrix, Wx, for each of a plurality of uncompressed recurrent layers of the uncompressed version of the RNN, each recurrent layer of the plurality of uncompressed recurrent layers configured to, for each of a plurality of time steps; receive a respective layer input for the time step; and process the respective layer input for the time step to generate a respective layer output for the time step; re-configuring the trained RNN by, for at least one recurrent layer of the plurality of uncompressed recurrent layers of the uncompressed version of the trained RNN, compressing, by the data processing hardware, the recurrent layer by; determining a respective singular value decomposition (SVD) of the respective recurrent weight matrix, Wh, for the recurrent layer; generating a first compressed weight matrix, Zhl, and a projection matrix, Pl, based on the respective SVD of the respective recurrent weight matrix, Wh, for the recurrent layer; generating a second compressed weight matrix, Zxl, based on the first compressed weight matrix, Zhl, and the projection matrix, Pl; replacing the respective recurrent weight matrix, Wh, with the product of the first compressed weight matrix, Zhl, and the projection matrix, Pl; and replacing the respective inter-layer weight matrix, Wx, with the product of the second compressed weight matrix, Zxl, and the projection matrix, Pl; and transmitting, by the data processing hardware, the re-configured trained RNN having the at least one compressed recurrent layer to a mobile device in communication with the data processing hardware, the re-configured trained RNN having the at least one compressed recurrent layer configured to receive a respective neural network input at each of multiple time steps and generate a respective neural network output at each of the multiple time steps. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15)
-
Specification