Dataflow Triggered Tasks for Accelerated Deep Learning
First Claim
1. A method comprising:
- sending a fabric packet by a sending processing element to a fabric, the fabric packet comprising a virtual channel specifier and a fabric packet payload;
routing the fabric packet via the fabric from the sending processing element to a receiving processing element via zero or more routing processing elements, the routing in accordance with a particular one of a plurality of virtual channels as specified by the virtual channel specifier; and
in the receiving processing element, receiving the fabric packet from the fabric, reading one or more instructions from a memory of the receiving processing element at an address based at least in part on the virtual channel specifier, and using at least a portion of the fabric packet payload as an input operand to execute at least one of the one or more instructions.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with nearest neighbors in a 2D mesh. Routing is controlled by respective virtual channel specifiers in each wavelet and routing configuration information in each router. A compute element receives a particular wavelet comprising a particular virtual channel specifier and a particular data element. Instructions are read from the memory of the compute element based at least in part on the particular virtual channel specifier. The particular data element is used as an input operand to execute at least one of the instructions.
-
Citations
42 Claims
-
1. A method comprising:
-
sending a fabric packet by a sending processing element to a fabric, the fabric packet comprising a virtual channel specifier and a fabric packet payload; routing the fabric packet via the fabric from the sending processing element to a receiving processing element via zero or more routing processing elements, the routing in accordance with a particular one of a plurality of virtual channels as specified by the virtual channel specifier; and in the receiving processing element, receiving the fabric packet from the fabric, reading one or more instructions from a memory of the receiving processing element at an address based at least in part on the virtual channel specifier, and using at least a portion of the fabric packet payload as an input operand to execute at least one of the one or more instructions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system comprising:
-
means for sending a fabric packet by a sending processing element to a fabric, the fabric packet comprising a virtual channel specifier and a fabric packet payload; means for routing the fabric packet via the fabric from the sending processing element to a receiving processing element via zero or more routing processing elements, the means for routing operating in accordance with a particular one of a plurality of virtual channels as specified by the virtual channel specifier; and in the receiving, processing element, means for receiving the fabric packet from the fabric, means for reading one or more instructions from a memory of the receiving processing element at an address based at least in part on the virtual channel specifier, and means for using at least a portion of the fabric packet payload as an input operand to execute at least one of the one or more instructions. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A system comprising:
-
a sending processing element, one or more routing elements, and a receiving processing element, all comprising respective couplings to a fabric, and the receiving processing element further comprising a memory, an instruction fetcher, a data sequencer, and an instruction execution unit; wherein the sending processing element is enabled to transmit a fabric packet comprising a virtual channel specifier and a fabric packet payload to the fabric via the coupling of the sending processing element; wherein the one or more routing elements are enabled to route the fabric packet in accordance with a particular one of a plurality of virtual channels as specified by the virtual channel specifier to the receiving processing element via the fabric couplings of the one or more routing elements; wherein the receiving processing element is enabled to receive the fabric packet via the coupling of the receiving processing element; wherein the memory is enabled to store one or more instructions at an address based at least in part on the virtual channel specifier; wherein the instruction fetcher is enabled to read the one or more instructions at the address; wherein the data sequencer is enabled to provide a portion of the packet payload to the instruction execution unit; and wherein the instruction execution unit is enabled to execute at least one of the one or more instructions. - View Dependent Claims (30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42)
-
Specification