Field-programmable gate array based accelerator system
First Claim
1. A neural network computing system comprising:
- a Field Programmable Gate Array (FPGA) configured to have a hardware logic performing computations associated with a neural network training algorithm by receiving streamed data directly from a host computer device, the hardware logic including a processing element that performs computations related to a hidden layer of the neural network training algorithm, the processing element including a plurality of arithmetic logic units each representing a hidden node of the hidden layer; and
an interface for connecting the FPGA to the host computing device and receiving the streamed data.
2 Assignments
0 Petitions
Accused Products
Abstract
Accelerator systems and methods are disclosed that utilize FPGA technology to achieve better parallelism and processing speed. A Field Programmable Gate Array (FPGA) is configured to have a hardware logic performing computations associated with a neural network training algorithm, especially a Web relevance ranking algorithm such as LambaRank. The training data is first processed and organized by a host computing device, and then streamed to the FPGA for direct access by the FPGA to perform high-bandwidth computation with increased training speed. Thus, large data sets such as that related to Web relevance ranking can be processed. The FPGA may include a processing element performing computations of a hidden layer of the neural network training algorithm. Parallel computing may be realized using a single instruction multiple data streams (SIMD) architecture with multiple arithmetic logic units in the FPGA.
143 Citations
19 Claims
-
1. A neural network computing system comprising:
-
a Field Programmable Gate Array (FPGA) configured to have a hardware logic performing computations associated with a neural network training algorithm by receiving streamed data directly from a host computer device, the hardware logic including a processing element that performs computations related to a hidden layer of the neural network training algorithm, the processing element including a plurality of arithmetic logic units each representing a hidden node of the hidden layer; and an interface for connecting the FPGA to the host computing device and receiving the streamed data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A neural network computing system comprising:
-
a host computing device for storing and processing training data; a Field Programmable Gate Array (FPGA) provided on a substrate, and configured to have a hardware logic performing computations associated with a neural network training algorithm by receiving streamed data directly from the host computer device, the FPGA including a hidden layer processing engine that performs computation associated with a hidden layer of the neural network training algorithm, the hidden layer processing engine including a plurality of processing elements each representing a hidden node of the hidden layer; and an interface for connecting the FPGA to the host computing device. - View Dependent Claims (15, 19)
-
-
16. A method for neural network training comprising:
-
storing and processing training data on a host computing device; streaming the processed training data to a Field Programmable Gate Array (FPGA) configured to have a hardware logic performing computations associated with a neural network training algorithm, the hardware logic including one or more processing engines that map to individual hidden nodes of the neural network training algorithm, the one or more processing engines performing at least part of computations of both forward propagation and backward propagation of the neural network training algorithm; and enabling the FPGA to perform neural network training with respect to the streamed training data. - View Dependent Claims (17, 18)
-
Specification