Field-programmable gate array based accelerator system

US 8,131,659 B2
Filed: 09/25/2008
Issued: 03/06/2012
Est. Priority Date: 09/25/2008
Status: Active Grant

First Claim

Patent Images

1. A neural network computing system comprising:

a Field Programmable Gate Array (FPGA) configured to have a hardware logic performing computations associated with a neural network training algorithm by receiving streamed data directly from a host computer device, the hardware logic including a processing element that performs computations related to a hidden layer of the neural network training algorithm, the processing element including a plurality of arithmetic logic units each representing a hidden node of the hidden layer; and

an interface for connecting the FPGA to the host computing device and receiving the streamed data.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Accelerator systems and methods are disclosed that utilize FPGA technology to achieve better parallelism and processing speed. A Field Programmable Gate Array (FPGA) is configured to have a hardware logic performing computations associated with a neural network training algorithm, especially a Web relevance ranking algorithm such as LambaRank. The training data is first processed and organized by a host computing device, and then streamed to the FPGA for direct access by the FPGA to perform high-bandwidth computation with increased training speed. Thus, large data sets such as that related to Web relevance ranking can be processed. The FPGA may include a processing element performing computations of a hidden layer of the neural network training algorithm. Parallel computing may be realized using a single instruction multiple data streams (SIMD) architecture with multiple arithmetic logic units in the FPGA.

143 Citations

View as Search Results

19 Claims

1. A neural network computing system comprising:
- a Field Programmable Gate Array (FPGA) configured to have a hardware logic performing computations associated with a neural network training algorithm by receiving streamed data directly from a host computer device, the hardware logic including a processing element that performs computations related to a hidden layer of the neural network training algorithm, the processing element including a plurality of arithmetic logic units each representing a hidden node of the hidden layer; and
  
  an interface for connecting the FPGA to the host computing device and receiving the streamed data.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
- - 2. The system as recited in claim 1, wherein the neural network training algorithm comprises a Web relevance ranking algorithm.
  - 3. The system as recited in claim 2, wherein the Web relevance ranking algorithm comprises a LambdaRank algorithm.
  - 4. The system as recited in claim 1, wherein the plurality of arithmetic logic units perform parallel computations.
  - 5. The system as recited in claim 1, wherein the hardware logic of the FPGA includes an arithmetic logic unit performing computations related to the neural network training algorithm, the arithmetic logic unit having a plurality of multi-pipeline multipliers and a plurality of multi-pipeline adders.
  - 6. The system as recited in claim 5, wherein at least one of the multi-pipeline multipliers and the multi-pipeline adders is based on floating-point numbers.
  - 7. The system as recited in claim 1, wherein the hardware logic of the FPGA includes a processing element performing at least part of computations of both forward propagation and backward propagation of the neural network training algorithm.
  - 8. The system as recited in claim 1, wherein the hardware logic of the FPGA includes a processing element for performing Lambda calculations of a LambdaRank algorithm.
  - 9. The system as recited in claim 1, wherein the neural network training algorithm includes a plurality of input nodes each corresponding to a feature for Web relevance ranking.
  - 10. The system as recited in claim 1, wherein the FPGA is provided on a substrate.
  - 11. The system as recited in claim 10, wherein the substrate comprises a Peripheral Component Interface (PCI) board, PCI-X board, PCI-Express board, HyperTransport board, Universal Serial Bus (USB) board or Front-Side Bus (FSB) board.
  - 12. The system as recited in claim 1, wherein the FPGA has a directly accessible internal memory, and at least part of the streamed data is written directly to the internal memory to be accessed by the FPGA to perform the computations associated with the neural network training.
  - 13. The system as recited in claim 1, wherein the interface comprises a downstream channel to send the streamed data from the host computing device to the FPGA, and wherein the downstream channel has a First In First Out (FIFO) unit.

14. A neural network computing system comprising:
- a host computing device for storing and processing training data;
  
  a Field Programmable Gate Array (FPGA) provided on a substrate, and configured to have a hardware logic performing computations associated with a neural network training algorithm by receiving streamed data directly from the host computer device, the FPGA including a hidden layer processing engine that performs computation associated with a hidden layer of the neural network training algorithm, the hidden layer processing engine including a plurality of processing elements each representing a hidden node of the hidden layer; and
  
  an interface for connecting the FPGA to the host computing device.
- View Dependent Claims (15, 19)
- - 15. The method as recited in claim 14, wherein the neural network training algorithm comprises a LambdaRank algorithm.
  - 19. The system as recited in claim 15, wherein the FPGA further includes:
    - an output layer processing engine that performs computation associated with an output layer of the neural network training algorithm;
      
      a lambda unit that performs computation associated with a calculation of lambda in the LambdaRank algorithm; and
      
      an internal memory directly accessible by the hidden layer processing engine, the output layer processing engine and the lambda unit.

16. A method for neural network training comprising:
- storing and processing training data on a host computing device;
  
  streaming the processed training data to a Field Programmable Gate Array (FPGA) configured to have a hardware logic performing computations associated with a neural network training algorithm, the hardware logic including one or more processing engines that map to individual hidden nodes of the neural network training algorithm, the one or more processing engines performing at least part of computations of both forward propagation and backward propagation of the neural network training algorithm; and
  
  enabling the FPGA to perform neural network training with respect to the streamed training data.
- View Dependent Claims (17, 18)
- - 17. The method as recited in claim 16, wherein the neural network training algorithm comprises a Web relevance ranking algorithm.
  - 18. The method as recited in claim 16, wherein the Web relevance ranking algorithm comprises a LambdaRank algorithm.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Corporation
Inventors
Xu, Ning-Yi, Cai, Xiong-Fei, Gao, Rui, Yan, Jing, Hsu, Feng-Hsiung
Primary Examiner(s)
Vincent, David

Application Number

US12/238,239
Publication Number

US 20100076915A1
Time in Patent Office

1,258 Days
Field of Search

706/12, 706/45, 706/62, 706/15
US Class Current

706/15
CPC Class Codes

G06N 3/063 using electronic means

G06N 3/084 Backpropagation, e.g. using...

Field-programmable gate array based accelerator system

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

143 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Field-programmable gate array based accelerator system

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

143 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links