Machine learning classification on hardware accelerators with stacked memory
First Claim
1. A method for processing on an acceleration component a machine learning classification model comprising a plurality of decision trees, the decision trees comprising a first amount of decision tree data, the acceleration component comprising an acceleration component die and a memory stack disposed in an integrated circuit package, the memory stack comprising an acceleration component memory having a second amount of memory less than the first amount of decision tree data, the memory stack comprising a memory bandwidth greater than 50 GB/sec and a power efficiency of greater than 20 MB/sec/mW, the method comprising:
- slicing the model into a plurality of model slices, each of the model slices having a third amount of decision tree data less than or equal to the second amount of memory;
storing the plurality of model slices on the memory stack;
copying a first model slice to the acceleration component memory;
processing the first model slice using a set of input data on the acceleration component to produce a first slice result;
selecting, based at least in part on the first slice result, a second model slice; and
repeating the copying and the processing for the second model slice;
wherein the selecting of the second model slice results in a third model slice not being processed.
1 Assignment
0 Petitions
Accused Products
Abstract
A method is provided for processing on an acceleration component a machine learning classification model. The machine learning classification model includes a plurality of decision trees, the decision trees including a first amount of decision tree data. The acceleration component includes an acceleration component die and a memory stack disposed in an integrated circuit package. The memory die includes an acceleration component memory having a second amount of memory less than the first amount of decision tree data. The memory stack includes a memory bandwidth greater than about 50 GB/sec and a power efficiency of greater than about 20 MB/sec/mW. The method includes slicing the model into a plurality of model slices, each of the model slices having a third amount of decision tree data less than or equal to the second amount of memory, storing the plurality of model slices on the memory stack, and for each of the model slices, copying the model slice to the acceleration component memory, and processing the model slice using a set of input data on the acceleration component to produce a slice result.
59 Citations
20 Claims
-
1. A method for processing on an acceleration component a machine learning classification model comprising a plurality of decision trees, the decision trees comprising a first amount of decision tree data, the acceleration component comprising an acceleration component die and a memory stack disposed in an integrated circuit package, the memory stack comprising an acceleration component memory having a second amount of memory less than the first amount of decision tree data, the memory stack comprising a memory bandwidth greater than 50 GB/sec and a power efficiency of greater than 20 MB/sec/mW, the method comprising:
-
slicing the model into a plurality of model slices, each of the model slices having a third amount of decision tree data less than or equal to the second amount of memory; storing the plurality of model slices on the memory stack; copying a first model slice to the acceleration component memory; processing the first model slice using a set of input data on the acceleration component to produce a first slice result; selecting, based at least in part on the first slice result, a second model slice; and repeating the copying and the processing for the second model slice; wherein the selecting of the second model slice results in a third model slice not being processed. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for processing a machine learning classification model comprising a plurality of decision trees, the decision trees comprising a first amount of decision tree data, the system comprising:
-
an acceleration component die; a memory stack disposed with the acceleration component die in an integrated circuit package, the memory stack comprising an acceleration component memory having a second amount of memory less than the first amount of decision tree data, the memory stack comprising a memory bandwidth greater than 50 GB/sec and a power efficiency of greater than 20 MB/sec/mW; and a computer readable storage medium comprising computer-executable instructions, which, when executed, slice the model into a plurality of model slices, each of the model slices having a third amount of decision tree data less than or equal to the second amount of memory, and store the plurality of model slices on the memory stack, wherein for each of the model slices, the acceleration component die comprises circuitry that is configured to copy a first model slice to the acceleration component memory, process the first model slice using a set of input data on the acceleration component die to produce a first slice result, select, based at least in part on the first slice result, a second model slice, and repeat the copying and the processing for the second model slice, the selecting of the second model slice resulting in a third model slice not being processed. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A method for processing on an acceleration component a machine learning classification model comprising a decision tree comprising a first amount of decision tree data, the acceleration component comprising an acceleration component die and a memory stack disposed in an integrated circuit package, the memory stack comprising an acceleration component memory having a second amount of memory less than the first amount of decision tree data, the memory stack comprising a memory bandwidth greater than 50 GB/sec and a power efficiency of greater than 20 MB/sec/mW, the method comprising:
storing the decision trees on the memory stack; copying a first portion of the decision tree to the acceleration component memory; processing the first portion using a set of input data on the acceleration component to produce a first portion result; selecting, based at least in part on the first portion result, a second portion of the decision tree; and repeating the copying and the processing for the second portion of the decision tree; wherein the selecting the second portion of the decision tree results in a third portion of the decision tree not being processed. - View Dependent Claims (18, 19, 20)
Specification