Machine learning classification on hardware accelerators with stacked memory

US 10,452,995 B2
Filed: 06/29/2015
Issued: 10/22/2019
Est. Priority Date: 06/29/2015
Status: Active Grant

First Claim

Patent Images

1. A method for processing on an acceleration component a machine learning classification model comprising a plurality of decision trees, the decision trees comprising a first amount of decision tree data, the acceleration component comprising an acceleration component die and a memory stack disposed in an integrated circuit package, the memory stack comprising an acceleration component memory having a second amount of memory less than the first amount of decision tree data, the memory stack comprising a memory bandwidth greater than 50 GB/sec and a power efficiency of greater than 20 MB/sec/mW, the method comprising:

slicing the model into a plurality of model slices, each of the model slices having a third amount of decision tree data less than or equal to the second amount of memory;

storing the plurality of model slices on the memory stack;

copying a first model slice to the acceleration component memory;

processing the first model slice using a set of input data on the acceleration component to produce a first slice result;

selecting, based at least in part on the first slice result, a second model slice; and

repeating the copying and the processing for the second model slice;

wherein the selecting of the second model slice results in a third model slice not being processed.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method is provided for processing on an acceleration component a machine learning classification model. The machine learning classification model includes a plurality of decision trees, the decision trees including a first amount of decision tree data. The acceleration component includes an acceleration component die and a memory stack disposed in an integrated circuit package. The memory die includes an acceleration component memory having a second amount of memory less than the first amount of decision tree data. The memory stack includes a memory bandwidth greater than about 50 GB/sec and a power efficiency of greater than about 20 MB/sec/mW. The method includes slicing the model into a plurality of model slices, each of the model slices having a third amount of decision tree data less than or equal to the second amount of memory, storing the plurality of model slices on the memory stack, and for each of the model slices, copying the model slice to the acceleration component memory, and processing the model slice using a set of input data on the acceleration component to produce a slice result.

59 Citations

View as Search Results

20 Claims

1. A method for processing on an acceleration component a machine learning classification model comprising a plurality of decision trees, the decision trees comprising a first amount of decision tree data, the acceleration component comprising an acceleration component die and a memory stack disposed in an integrated circuit package, the memory stack comprising an acceleration component memory having a second amount of memory less than the first amount of decision tree data, the memory stack comprising a memory bandwidth greater than 50 GB/sec and a power efficiency of greater than 20 MB/sec/mW, the method comprising:
- slicing the model into a plurality of model slices, each of the model slices having a third amount of decision tree data less than or equal to the second amount of memory;
  
  storing the plurality of model slices on the memory stack;
  
  copying a first model slice to the acceleration component memory;
  
  processing the first model slice using a set of input data on the acceleration component to produce a first slice result;
  
  selecting, based at least in part on the first slice result, a second model slice; and
  
  repeating the copying and the processing for the second model slice;
  
  wherein the selecting of the second model slice results in a third model slice not being processed.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the acceleration component comprises one or more of a field-programmable gate array device, a massively parallel processor array device, a graphics processing unit, and an application-specific integrated circuit.
  - 3. The method of claim 1, wherein the memory stack comprises multiple memory die.
  - 4. The method of claim 1, wherein the acceleration component further comprises an interposer, and the acceleration component die and the memory stack are disposed on the interposer.
  - 5. The method of claim 1, wherein the memory stack is disposed above the acceleration component die.
  - 6. The method of claim 1, wherein the processing the first model slice using the set of input data comprisesprocessing the first model slice using each of a plurality of sets of input data stored on the memory stack.
  - 7. The method of claim 1, further comprising summing the slice results for each of the slices.
  - 8. The method of claim 1, wherein the acceleration component is part of a server unit that also comprises one or more central processing units and a network controller coupled to both the acceleration component and the one or more central processing units.

9. A system for processing a machine learning classification model comprising a plurality of decision trees, the decision trees comprising a first amount of decision tree data, the system comprising:
- an acceleration component die;
  
  a memory stack disposed with the acceleration component die in an integrated circuit package, the memory stack comprising an acceleration component memory having a second amount of memory less than the first amount of decision tree data, the memory stack comprising a memory bandwidth greater than 50 GB/sec and a power efficiency of greater than 20 MB/sec/mW; and
  
  a computer readable storage medium comprising computer-executable instructions, which, when executed, slice the model into a plurality of model slices, each of the model slices having a third amount of decision tree data less than or equal to the second amount of memory, and store the plurality of model slices on the memory stack,wherein for each of the model slices, the acceleration component die comprises circuitry that is configured to copy a first model slice to the acceleration component memory, process the first model slice using a set of input data on the acceleration component die to produce a first slice result, select, based at least in part on the first slice result, a second model slice, and repeat the copying and the processing for the second model slice, the selecting of the second model slice resulting in a third model slice not being processed.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The system of claim 9, wherein the acceleration component die comprises one or more of a field-programmable gate array device, a massively parallel processor array device, a graphics processing unit, and an application-specific integrated circuit.
  - 11. The system of claim 9, wherein the memory stack comprises multiple memory die.
  - 12. The system of claim 9, wherein the acceleration component die and the memory stack are disposed on an interposer.
  - 13. The system of claim 9, wherein the memory stack is disposed above the acceleration component die.
  - 14. The system of claim 9, wherein the processing the first model slice using the set of input data comprisesprocessing the first model slice using each of a plurality of sets of input data stored on the memory stack.
  - 15. The system of claim 9, wherein the acceleration component die is configured to sum the slice results for each of the slices.
  - 16. The system of claim 9, further comprising a server unit component comprising the integrated circuit package as well as one or more central processing units and a network interface controller coupled to both the integrated circuit package and the one or more central processing units.

17. A method for processing on an acceleration component a machine learning classification model comprising a decision tree comprising a first amount of decision tree data, the acceleration component comprising an acceleration component die and a memory stack disposed in an integrated circuit package, the memory stack comprising an acceleration component memory having a second amount of memory less than the first amount of decision tree data, the memory stack comprising a memory bandwidth greater than 50 GB/sec and a power efficiency of greater than 20 MB/sec/mW, the method comprising:
- storing the decision trees on the memory stack;
  
  copying a first portion of the decision tree to the acceleration component memory;
  
  processing the first portion using a set of input data on the acceleration component to produce a first portion result;
  
  selecting, based at least in part on the first portion result, a second portion of the decision tree; and
  
  repeating the copying and the processing for the second portion of the decision tree;
  
  wherein the selecting the second portion of the decision tree results in a third portion of the decision tree not being processed.
- View Dependent Claims (18, 19, 20)
- - 18. The method of claim 17, wherein the first portion of the decision tree comprises a top portion of the decision tree.
  - 19. The method of claim 17, wherein the acceleration component comprises one or more of a field-programmable gate array device, a massively parallel processor array device, a graphics processing unit, and an application-specific integrated circuit.
  - 20. The method of claim 17, wherein the acceleration component further comprises an interposer, and the acceleration component die and the memory stack are disposed on the interposer.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Burger, Douglas C., Chiou, Derek, Chung, Eric, Putnam, Andrew R.
Primary Examiner(s)
Sitiriche, Luis A

Application Number

US14/754,323
Publication Number

US 20160379137A1
Time in Patent Office

1,576 Days
Field of Search
US Class Current
CPC Class Codes

G06F 9/46   Multiprogramming arrangements

G06F 9/50   Allocation of resources, e....

G06N 20/00   Machine learning

Y02D 10/00   Energy efficient computing,...

Machine learning classification on hardware accelerators with stacked memory

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

59 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Machine learning classification on hardware accelerators with stacked memory

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

59 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others