HARDWARE ACCELERATOR FOR COMPRESSED RNN ON FPGA
First Claim
Patent Images
1. A device for implementing compressed RNN (recurrent neural network), said device comprising:
- a receiving unit, which is used to receive a plurality of input vectors and distributed them to a plurality of processing elements (PE);
a plurality of processing elements (PE), each of which comprising;
a reading unit configured to read weight matrices W, said W indicates weights of said RNN;
ALU configured to perform multiplication and addition calculation of said weight matrices W;
calculation buffer configured to store intermediate results of matrix-vector multiplication and output results to an assembling unit;
an assembling unit configured to receive results from PEs and assemble them into a complete result vector;
a controller unit configured for controlling said plurality of processing elements.
6 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to recurrent neural network. In particular, the present invention relates to how to implement and accelerate a recurrent neural network based on an embedded FPGA. Specifically, it proposes an overall design processing method of matrix decoding, matrix-vector multiplication, vector accumulation and activation function. In another aspect, the present invention proposes an overall hardware design to implement and accelerate the above process.
19 Citations
16 Claims
-
1. A device for implementing compressed RNN (recurrent neural network), said device comprising:
-
a receiving unit, which is used to receive a plurality of input vectors and distributed them to a plurality of processing elements (PE); a plurality of processing elements (PE), each of which comprising; a reading unit configured to read weight matrices W, said W indicates weights of said RNN; ALU configured to perform multiplication and addition calculation of said weight matrices W; calculation buffer configured to store intermediate results of matrix-vector multiplication and output results to an assembling unit; an assembling unit configured to receive results from PEs and assemble them into a complete result vector; a controller unit configured for controlling said plurality of processing elements. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for implementing compressed RNN based on FPGA, comprising:
-
a) receiving data from off-chip memory and storing the data into on-chip memory of FPGA, wherein said data are related to RNN computation, including input vector, bias vector and weight matrices; b) decoding the data received in step a) using FPGA on-chip processor in order to obtain the real weights, and storing the real weight into FPGA on-chip memory; c) matrix computing by performing matrix-vector multiplication using FPGA on-chip processing elements and storing the result into FPGA on-chip memory; d) vector accumulating by performing vector accumulation using FPGA on-chip processing elements and storing the results into FPGA on-chip memory, said vectors including both resultant vector obtained in step c) and said bias vector; e) activation function computing by performing activation function on the result of the above step d), and storing the result into FPGA on-chip memory; iterating the above steps a), b), c), d), e) to obtain RNN'"'"'s activation sequences and computing RNN'"'"'s output sequence according to the activation sequences. - View Dependent Claims (11, 12, 13)
-
-
14. A method for implementing a Recurrent Neural Network (RNN), wherein the weights of said RNN being characterized by Whh and Whx, where Whh is the weight matrix of hidden layers of said RNN, Whx is the weigh matrix being applied to input of the hidden layers, where an activation to be applied to an input vector by said hidden layers is ht, the input of said RNN is a series of input vectors x=(x1,x2. . . , xT), said method comprises:
-
initialization step of reading necessary data for computing Whxx into a FPGA on-chip memory, said data including input vectors x and Whx, where Whx is a weight matrix to be applied to said input vector x; step 1 of computing Whxx by processing elements of said FPGA, and reading necessary data for computing Whhx into the FPGA on-chip memory; step 2 of computing Whhht−
1 by processing elements of said FPGA, where ht−
1 is an activation to be applied to the previous input vector by the hidden layer, and reading necessary data for computing the next Whxx into the FPGA on-chip memory; anditeratively repeating said step 1 and step 2. - View Dependent Claims (15, 16)
-
Specification