Deep neural network processing on hardware accelerators with stacked memory
First Claim
Patent Images
1. A method for processing a deep neural network, the method comprising:
- configuring an acceleration component to perform forward propagation and backpropagation stages of the deep neural network, the acceleration component comprising an acceleration component die and a memory stack both disposed in a single integrated circuit package, the acceleration component comprising multiple discrete neural engine processing units, each neural engine processing unit comprising both input buffer memory and logic circuitry that implements at least one of;
dot-products, derivatives or non-linear functions, the configuring comprising;
storing at least one of;
weights, input activations or errors in the memory stack;
assigning, to individual ones of the neural engine processing units, a portion of the weights; and
streaming portions of the weights, input activations or errors from the memory stack to the input buffer memory of respective ones of the neural engine processing units, the input buffer memory of each of the respective ones of the neural engine processing units individually comprising at least one of;
a weights input memory into which the portions of the weights are stored;
an activations input memory into which the portions of the input activations are stored;
oran error input memory into which the portions of the errors are stored;
wherein at least one of the weights input memory, the activations input memory or the error input memory of one neural engine processing unit is communicationally coupled to a corresponding one of the weights input memory, the activations input memory or the error input memory of a preceding neural engine processing unit and to a corresponding one of the weights input memory, the activations input memory or the error input memory of a subsequent neural engine processing unit.
1 Assignment
0 Petitions
Accused Products
Abstract
A method is provided for processing on an acceleration component a deep neural network. The method includes configuring the acceleration component to perform forward propagation and backpropagation stages of the deep neural network. The acceleration component includes an acceleration component die and a memory stack disposed in an integrated circuit package. The memory stack has a memory bandwidth greater than about 50 GB/sec and a power efficiency of greater than about 20 MB/sec/mW.
-
Citations
20 Claims
-
1. A method for processing a deep neural network, the method comprising:
-
configuring an acceleration component to perform forward propagation and backpropagation stages of the deep neural network, the acceleration component comprising an acceleration component die and a memory stack both disposed in a single integrated circuit package, the acceleration component comprising multiple discrete neural engine processing units, each neural engine processing unit comprising both input buffer memory and logic circuitry that implements at least one of;
dot-products, derivatives or non-linear functions, the configuring comprising;storing at least one of;
weights, input activations or errors in the memory stack;assigning, to individual ones of the neural engine processing units, a portion of the weights; and streaming portions of the weights, input activations or errors from the memory stack to the input buffer memory of respective ones of the neural engine processing units, the input buffer memory of each of the respective ones of the neural engine processing units individually comprising at least one of; a weights input memory into which the portions of the weights are stored; an activations input memory into which the portions of the input activations are stored;
oran error input memory into which the portions of the errors are stored; wherein at least one of the weights input memory, the activations input memory or the error input memory of one neural engine processing unit is communicationally coupled to a corresponding one of the weights input memory, the activations input memory or the error input memory of a preceding neural engine processing unit and to a corresponding one of the weights input memory, the activations input memory or the error input memory of a subsequent neural engine processing unit. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for processing a deep neural network, the system comprising:
-
an acceleration component that performs forward propagation and backpropagation stages of the deep neural network, the acceleration component comprising an acceleration component die and a memory stack both disposed in a single integrated circuit package; and multiple discrete neural engine processing units on the acceleration component die, the multiple discrete neural engine processing units each comprising both input buffer memory and logic circuitry that implements at least one of;
dot-products, derivatives or non-linear functions;wherein the memory stack has stored thereon at least one of;
weights, input activations or errors;wherein individual ones of the multiple discrete neural engine processing units are assigned a portion of the weights; and wherein portions of the weights, input activations or errors are streamed from the memory stack to the input buffer memory of respective ones of the neural engine processing units, the input buffer memory of each of the respective ones of the neural engine processing units individually comprising at least one of; a weights input memory into which the portions of the weights are stored; an activations input memory into which the portions of the input activations are stored;
oran error input memory into which the portions of the errors are stored; wherein at least one of the weights input memory, the activations input memory or the error input memory of one neural engine processing unit is communicationally coupled to a corresponding one of the weights input memory, the activations input memory or the error input memory of a preceding neural engine processing unit and to a corresponding one of the weights input memory, the activations input memory or the error input memory of a subsequent neural engine processing unit. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system for processing a deep neural network, the system comprising:
-
an acceleration component comprising an acceleration component die and a memory stack both disposed in a single integrated circuit package; multiple discrete neural engine processing units on the acceleration component die, the multiple discrete neural engine processing units each comprising both input buffer memory and logic circuitry that implements at least one of;
dot-products, derivatives or non-linear functions; anda plurality of DRAM channels on the memory stack, wherein individual ones of the DRAM channels are coupled to individual ones of the multiple discrete neural engine processing units, wherein the memory stack has stored thereon at least one of;
weights, input activations or errors;wherein individual ones of the multiple discrete neural engine processing units are assigned a portion of the weights by storing the weights in the individual ones of the DRAM channels that are coupled to the individual ones of the multiple discrete neural engine processing units that are assigned the portion of the weights; and wherein the portion of the weights are copied from the individual ones of the DRAM channels to a weights input memory buffer of each of the individual ones of the multiple discrete neural engine processing units, the weights input memory buffer of a first neural engine processing unit being communicationally coupled to a weights input memory buffer of a preceding neural engine processing unit and to a weights input memory buffer of a subsequent neural engine processing unit. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification