MACHINE LEARNING CLASSIFICATION ON HARDWARE ACCELERATORS WITH STACKED MEMORY
First Claim
1. A method for processing on an acceleration component a machine learning classification model comprising a plurality of decision trees, the decision trees comprising a first amount of decision tree data, the acceleration component comprising an acceleration component die and a memory stack disposed in an integrated circuit package, the memory die comprising an acceleration component memory having a second amount of memory less than the first amount of decision tree data, the memory stack comprising a memory bandwidth greater than about 50 GB/sec and a power efficiency of greater than about 20 MB/sec/mW, the method comprising:
- slicing the model into a plurality of model slices, each of the model slices having a third amount of decision tree data less than or equal to the second amount of memory;
storing the plurality of model slices on the memory stack; and
for each of the model slices;
copying the model slice to the acceleration component memory; and
processing the model slice using a set of input data on the acceleration component to produce a slice result.
1 Assignment
0 Petitions
Accused Products
Abstract
A method is provided for processing on an acceleration component a machine learning classification model. The machine learning classification model includes a plurality of decision trees, the decision trees including a first amount of decision tree data. The acceleration component includes an acceleration component die and a memory stack disposed in an integrated circuit package. The memory die includes an acceleration component memory having a second amount of memory less than the first amount of decision tree data. The memory stack includes a memory bandwidth greater than about 50 GB/sec and a power efficiency of greater than about 20 MB/sec/mW. The method includes slicing the model into a plurality of model slices, each of the model slices having a third amount of decision tree data less than or equal to the second amount of memory, storing the plurality of model slices on the memory stack, and for each of the model slices, copying the model slice to the acceleration component memory, and processing the model slice using a set of input data on the acceleration component to produce a slice result.
-
Citations
20 Claims
-
1. A method for processing on an acceleration component a machine learning classification model comprising a plurality of decision trees, the decision trees comprising a first amount of decision tree data, the acceleration component comprising an acceleration component die and a memory stack disposed in an integrated circuit package, the memory die comprising an acceleration component memory having a second amount of memory less than the first amount of decision tree data, the memory stack comprising a memory bandwidth greater than about 50 GB/sec and a power efficiency of greater than about 20 MB/sec/mW, the method comprising:
-
slicing the model into a plurality of model slices, each of the model slices having a third amount of decision tree data less than or equal to the second amount of memory; storing the plurality of model slices on the memory stack; and for each of the model slices; copying the model slice to the acceleration component memory; and processing the model slice using a set of input data on the acceleration component to produce a slice result. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for processing a machine learning classification model comprising a plurality of decision trees, the decision trees comprising a first amount of decision tree data, the system comprising:
-
an acceleration component comprising an acceleration component die and a memory stack disposed in an integrated circuit package, the memory die comprising an acceleration component memory having a second amount of memory less than the first amount of decision tree data, the memory stack comprising a memory bandwidth greater than about 50 GB/sec and a power efficiency of greater than about 20 MB/sec/mW; and a model slicing component configured to slice the model into a plurality of model slices, each of the model slices having a third amount of decision tree data less than or equal to the second amount of memory, and store the plurality of model slices on the memory stack, wherein for each of the model slices, the acceleration component is configured to copy the model slice to the acceleration component memory and is configured to process the model slice using a set of input data on the acceleration component to produce a slice result. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A method for processing on an acceleration component a machine learning classification model comprising a plurality of decision trees, the decision trees comprising a first amount of decision tree data, the acceleration component comprising an acceleration component die and a memory stack disposed in an integrated circuit package, the memory die comprising an acceleration component memory having a second amount of memory less than the first amount of decision tree data, the memory stack comprising a memory bandwidth greater than about 50 GB/sec and a power efficiency of greater than about 20 MB/sec/mW, the method comprising:
-
storing the plurality of decision trees on the memory stack; for each of the decision trees; copying a first portion of the decision tree to the acceleration component memory; processing the first portion using a set of input data on the acceleration component; and copying a second portion of decision tree to the acceleration component memory based on a result of processing the first portion of the decision tree. - View Dependent Claims (18, 19, 20)
-
Specification