Field-programmable gate array based accelerator system
First Claim
1. A system comprising:
- a Field Programmable Gate Array (FPGA) to perform a machine learning algorithm using training data;
a Peripheral Component Interface (PCI) controller to communicate with a Central Processing Unit (CPU) of a host computing device, anda memory hierarchy composed of Static Random Access Memory (SRAM) and Synchronous Dynamic Random Access Memory (SDRAM) associated with the FPGA and embedded Random Access Memory (RAM) within the FPGA, the training data being loaded onto at least a portion of the memory hierarchy and organized according to a streaming memory access order for streaming memory access by logic associated with the FPGA; and
a control unit within the FPGA to direct the FPGA to;
build a histogram based in part on at least a subset of the training data;
build an integral histogram based in part on the histogram; and
send a result of the machine learning algorithm to a First In First Out (FIFO) queue for presentation to the host computing device, the result being based at least in part on the integral histogram.
2 Assignments
0 Petitions
Accused Products
Abstract
Accelerator systems and methods are disclosed that utilize FPGA technology to achieve better parallelism and flexibility. The accelerator system may be used to implement a relevance-ranking algorithm, such as RankBoost, for a training process. The algorithm and related data structures may be organized to enable streaming data access and, thus, increase the training speed. The data may be compressed to enable the system and method to be operable with larger data sets. At least a portion of the approximated RankBoost algorithm may be implemented as a single instruction multiple data streams (SIMD) architecture with multiple processing engines (PEs) in the FPGA. Thus, large data sets can be loaded on memories associated with an FPGA to increase the speed of the relevance ranking algorithm.
117 Citations
22 Claims
-
1. A system comprising:
-
a Field Programmable Gate Array (FPGA) to perform a machine learning algorithm using training data; a Peripheral Component Interface (PCI) controller to communicate with a Central Processing Unit (CPU) of a host computing device, and a memory hierarchy composed of Static Random Access Memory (SRAM) and Synchronous Dynamic Random Access Memory (SDRAM) associated with the FPGA and embedded Random Access Memory (RAM) within the FPGA, the training data being loaded onto at least a portion of the memory hierarchy and organized according to a streaming memory access order for streaming memory access by logic associated with the FPGA; and a control unit within the FPGA to direct the FPGA to; build a histogram based in part on at least a subset of the training data; build an integral histogram based in part on the histogram; and send a result of the machine learning algorithm to a First In First Out (FIFO) queue for presentation to the host computing device, the result being based at least in part on the integral histogram. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A system comprising:
-
a Field Programmable Gate Array (FPGA) to perform a machine learning algorithm using training data, the FPGA comprising multiple processing engines (PEs); a Peripheral Component Interface (PCI) controller to communicate with a Central Processing Unit (CPU) of a host computing device; and a memory hierarchy composed of Static Random Access Memory (SRAM) and Synchronous Dynamic Random Access Memory (SDRAM) associated with the FPGA and embedded Random Access Memory (RAM) within the FPGA, the training data being loaded onto the memory hierarchy and organized according to a streaming memory access order for streaming memory access by each of the PEs, the streaming memory access order enabling the streaming memory access by each of the PEs without interaction with software or drivers, the training data being stored in the memory hierarchy in a compressed format, the FPGA being configured to decompress the training data prior to the using the training data; and a control unit within the FPGA to direct one or more components of the FPGA to; build a histogram based in part on at least a subset of the training data; build an integral histogram based in part on the histogram; and provide a result of the machine learning algorithm for presentation to the host computing device, the result being based at least in part on the integral histogram. - View Dependent Claims (18, 19, 20, 21)
-
-
22. A system comprising:
-
a Field Programmable Gate Array (FPGA) to perform a machine learning algorithm using training data, the machine learning algorithm comprising a document-based relevance-ranking algorithm implemented at least in part as a single instruction multiple data streams (SIMD) architecture using processing engines (PEs) in the FPGA; a memory hierarchy composed of Double Data Rate (DDR) memory associated with the FPGA and embedded Random Access Memory (RAM) within the FPGA, the training data being loaded onto the memory hierarchy and organized according to a streaming memory access order for streaming memory access by each of the PEs, the training data includes feature values of documents classified into bins that define at least in part the streaming memory access order; and a control unit within the FPGA to direct the PEs to; build histograms based in part on the feature values of the documents as classified into the bins; and build multiple integral histograms based in part on at least a subset of the histograms, the multiple integral histograms being built at the same time.
-
Specification