Parallel processing machine learning decision tree training
First Claim
1. A computing system comprising:
- a parallel processing pipeline comprising;
a plurality of processing blocks each including a plurality of graphical processing units (GPUs), each of the plurality of GPUs in a same processing block sharing a memory block that is not used by GPUs from other processing blocks; and
a global memory shared by all GPUs from all processing blocks of the parallel processing pipeline; and
a decision tree training program configured to generate a decision tree including a plurality of nodes organized into levels, the decision tree training program representing instructions executable by the parallel processing pipeline to, for each level of the decision tree;
perform, at each GPU of the parallel processing pipeline, a feature test for a feature in a feature set on every example in an example set for every node in a level;
accumulate, at each memory block, a result of each feature test performed on each example processed by the plurality of GPUs that share that memory block, wherein such accumulating includes performing an atomic increment to a portion of the memory block when a plurality of feature test results are in contention for the portion of the memory block, and performing a non-atomic increment to a portion of the memory block when one feature test result is to be stored in the portion of the memory block;
write the accumulated results from each memory block to the global memory to generate a histogram of features for every node in the level, wherein such writing includes performing an atomic add to the global memory for each portion of the memory block that includes an accumulated feature test value; and
for each node in the level, assign a feature having a lowest entropy in accordance with the histograms to the node.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments are disclosed herein that relate to generating a decision tree through graphical processing unit (GPU) based machine learning. For example, one embodiment provides a method including, for each level of the decision tree: performing, at each GPU of the parallel processing pipeline, a feature test for a feature in a feature set on every example in an example set. The method further includes accumulating results of the feature tests in local memory blocks. The method further includes writing the accumulated results from each local memory block to global memory to generate a histogram of features for every node in the level, and for each node in the level, assigning a feature having a lowest entropy in accordance with the histograms to the node.
212 Citations
15 Claims
-
1. A computing system comprising:
-
a parallel processing pipeline comprising; a plurality of processing blocks each including a plurality of graphical processing units (GPUs), each of the plurality of GPUs in a same processing block sharing a memory block that is not used by GPUs from other processing blocks; and a global memory shared by all GPUs from all processing blocks of the parallel processing pipeline; and a decision tree training program configured to generate a decision tree including a plurality of nodes organized into levels, the decision tree training program representing instructions executable by the parallel processing pipeline to, for each level of the decision tree; perform, at each GPU of the parallel processing pipeline, a feature test for a feature in a feature set on every example in an example set for every node in a level; accumulate, at each memory block, a result of each feature test performed on each example processed by the plurality of GPUs that share that memory block, wherein such accumulating includes performing an atomic increment to a portion of the memory block when a plurality of feature test results are in contention for the portion of the memory block, and performing a non-atomic increment to a portion of the memory block when one feature test result is to be stored in the portion of the memory block; write the accumulated results from each memory block to the global memory to generate a histogram of features for every node in the level, wherein such writing includes performing an atomic add to the global memory for each portion of the memory block that includes an accumulated feature test value; and for each node in the level, assign a feature having a lowest entropy in accordance with the histograms to the node. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for generating a decision tree including a plurality of nodes organized into levels from a parallel processing pipeline including a plurality of processing blocks, each processing block including a plurality of graphical processing units (GPUs), each of the plurality of GPUs in a same processing block sharing a memory block that is not used by GPUs from other processing blocks, and all GPUs from all processing blocks of the parallel processing pipeline sharing a global memory, the method comprising, for each level of the decision tree:
-
performing, at each GPU of the parallel processing pipeline, a feature test for a feature in a feature set on every pixel example selected from one of a plurality of depth maps included in an example set, wherein the example set includes example pixels from each of the plurality of depth maps, and wherein all example pixels from a depth map are processed by GPUs of a same processing block, and all GPUs in the same processing block process one depth map at a time; accumulating, at each memory block, a result of each feature test performed on each pixel example processed by the plurality of GPUs that share that memory block, wherein the accumulating includes performing an atomic increment to a portion of the memory block when a plurality of feature test results are in contention for the portion of the memory block, and performing a non-atomic increment to a portion of the memory block when one feature test result is to be stored in the portion of the memory block; writing the accumulated results from each memory block to the global memory to generate a histogram of features for every node in the level, wherein the writing includes performing an atomic add to the global memory for each portion of the memory block that includes an accumulated feature test result value; and for each node in the level, assigning a feature having a lowest entropy in accordance with the histograms to the node. - View Dependent Claims (9, 10, 11)
-
-
12. A computing system comprising:
-
a parallel processing pipeline comprising; a plurality of processing blocks each including a plurality of graphical processing units (GPUs); a plurality of memory blocks, each being shared by the plurality of GPUs of an associated processing block; and a global memory shared by each GPU of the parallel processing pipeline; and a decision tree training program configured to generate a decision tree including a plurality of nodes organized into levels, the decision tree training program representing instructions executable by the parallel processing pipeline to, for each level of the decision tree; perform, at each GPU of the parallel processing pipeline, a feature test for a feature in a feature set on every example in an example set for every node in a level; accumulate, at each memory block, a result of each feature test performed on each example processed by the plurality of GPUs that share that memory block, wherein such accumulating includes performing an atomic increment to a portion of the memory block when a plurality of feature test results are in contention for the portion of the memory block, and performing a non-atomic increment to a portion of the memory block when one feature test result is to be stored in the portion of the memory block; write the accumulated results from each memory block to the global memory to generate a histogram of features for every node in the level, wherein such writing includes performing an atomic add to the global memory for each portion of the memory block that includes an accumulated feature test value; and for each node in the level, assign a feature having a lowest entropy in accordance with the histograms to the node. - View Dependent Claims (13, 14, 15)
-
Specification