DEEP NEURAL NETWORK PROCESSING ON HARDWARE ACCELERATORS WITH STACKED MEMORY

US 20160379115A1
Filed: 06/29/2015
Published: 12/29/2016
Est. Priority Date: 06/29/2015
Status: Active Grant

First Claim

Patent Images

1. A method for processing on an acceleration component a deep neural network, the acceleration component comprising an acceleration component die and a memory stack disposed in an integrated circuit package, the memory stack comprising a memory bandwidth greater than about 50 GB/sec and a power efficiency of greater than about 20 MB/sec/mW, the method comprising:

configuring the acceleration component to perform forward propagation and backpropagation stages of the deep neural network.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method is provided for processing on an acceleration component a deep neural network. The method includes configuring the acceleration component to perform forward propagation and backpropagation stages of the deep neural network. The acceleration component includes an acceleration component die and a memory stack disposed in an integrated circuit package. The memory stack has a memory bandwidth greater than about 50 GB/sec and a power efficiency of greater than about 20 MB/sec/mW.

121 Citations

20 Claims

1. A method for processing on an acceleration component a deep neural network, the acceleration component comprising an acceleration component die and a memory stack disposed in an integrated circuit package, the memory stack comprising a memory bandwidth greater than about 50 GB/sec and a power efficiency of greater than about 20 MB/sec/mW, the method comprising:
- configuring the acceleration component to perform forward propagation and backpropagation stages of the deep neural network.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, wherein the acceleration component comprises one or more of a field-programmable gate array device, a massively parallel processor array device, a graphics processing unit, and an application-specific integrated circuit.
  - 3. The method of claim 1, wherein the memory stack comprises one or more memory die.
  - 4. The method of claim 1, wherein the acceleration component further comprises an interposer, and the acceleration component die and the memory stack are disposed on the interposer.
  - 5. The method of claim 1, wherein the memory stack is disposed above the acceleration component die.
  - 6. The method of claim 1, wherein configuring the acceleration component comprises configuring the acceleration component die with a plurality of neural engines that comprise logic to implement the forward propagation and the backpropagation stages.
  - 7. The method of claim 6, wherein each of the neural engines comprises logic to compute one or more of dot-products, derivatives, errors and non-linear functions.
  - 8. The method of claim 6, wherein:
    - the DNN comprises a plurality of weights, input activations and errors;
      
      each of the neural engines comprises storage elements; and
      
      configuring the acceleration component comprises;
      
      storing the weights, input activations and errors in the memory stack; and
      
      streaming the weights, input activations and errors to the storage elements of the neural engines.
  - 9. The method of claim 6, wherein configuring the acceleration component comprises streaming a subset of the weights, input activations and errors to the storage elements of the neural engines.

10. A system for processing a deep neural network, the system comprising:
- an acceleration component comprising an acceleration component die and a memory stack disposed in an integrated circuit package, the memory stack comprising a memory bandwidth greater than about 50 GB/sec and a power efficiency of greater than about 20 MB/sec/mW;
  
  a plurality of neural engines configured on acceleration component die, wherein the neural engines comprise logic to implement forward propagation and backpropagation stages of the deep neural network.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 20)
- - 11. The system of claim 10, wherein the acceleration component comprises one or more of a field-programmable gate array device, a massively parallel processor array device, a graphics processing unit, and an application-specific integrated circuit.
  - 12. The system of claim 10, wherein the memory stack comprises one or more memory die.
  - 13. The system of claim 10, wherein the acceleration component further comprises an interposer, and the acceleration component die and the memory stack are disposed on the interposer.
  - 14. The system of claim 10, wherein the memory stack is disposed above the acceleration component die.
  - 15. The system of claim 10, wherein each of the neural engines comprises logic to compute one or more of dot-products, derivatives, errors and non-linear functions.
  - 16. The system of claim 10, wherein:
    - the DNN comprises a plurality of weights, input activations and errors;
      
      each of the neural engines comprises storage elements; and
      
      the acceleration component is configured to;
      
      store the weights, input activations and errors in the memory stack; and
      
      stream the weights, input activations and errors to the storage elements of the neural engines.
  - 17. The system of claim 16, wherein the acceleration component is configured to stream a subset of the weights, input activations and errors to the storage elements of the neural engines.
  - 20. The system of claim 10, wherein the memory stack comprises one or more memory die.

18. A system for processing a deep neural network, the system comprising:
- an acceleration component comprising an acceleration component die and a memory stack disposed in an integrated circuit package, the memory stack comprising a memory bandwidth greater than about 50 GB/sec and a power efficiency of greater than about 20 MB/sec/mW;
  
  a plurality of neural engines configured on the acceleration component die; and
  
  a plurality of DRAM channels on the memory stack, each of the DRAM channels coupled to the neural engines,wherein the neural engines comprise logic to implement forward propagation and backpropagation stages of the deep neural network.
- View Dependent Claims (19)
- - 19. The system of claim 18, wherein the acceleration component comprises one or more of a field-programmable gate array device, a massively parallel processor array device, a graphics processing unit, and an application-specific integrated circuit.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Chiou, Derek, Chung, Eric, Burger, Douglas C., Putnam, Andrew R.

Granted Patent

US 10,540,588 B2
Time in Patent Office

Days
Field of Search
US Class Current

1/1
CPC Class Codes

G06F 15/7803   System on board, i.e. compu...

G06F 15/7821   Tightly coupled to memory, ...

G06N 3/063   using electronic means

G06N 3/08   Learning methods

G06N 3/084   Backpropagation, e.g. using...

G06N 5/025   Extracting rules from data

Y02D 10/00   Energy efficient computing,...

DEEP NEURAL NETWORK PROCESSING ON HARDWARE ACCELERATORS WITH STACKED MEMORY

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

121 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

DEEP NEURAL NETWORK PROCESSING ON HARDWARE ACCELERATORS WITH STACKED MEMORY

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

121 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links