Dataflow triggered tasks for accelerated deep learning
First Claim
1. A method comprising:
- sending a fabric packet by a sending processing element to a fabric, the fabric packet comprising a virtual channel specifier and a fabric packet payload;
routing the fabric packet via the fabric from the sending processing element to a receiving processing element via zero or more routing processing elements, the routing in accordance with the virtual channel specifier;
in the receiving processing element, receiving the fabric packet from the fabric, reading one or more instructions from a memory of the receiving processing element at an address based at least in part on the virtual channel specifier, and using at least a portion of the fabric packet payload as an input operand to execute at least one of the one or more instructions;
wherein the virtual channel specifier is one of a plurality of virtual channel specifiers, each of the plurality of virtual channel specifiers is associated with a respective set of one or more sets of fabric packets, and the receiving comprises associating the fabric packet with the respective set associated with the virtual channel specifier; and
wherein a block/unblock state is maintained for each of the virtual channel specifiers, and the block/unblock state of a particular one of the virtual channel specifiers is set to a block state in response to a block instruction specifying the particular one of the virtual channel specifiers and the block/unblock state of the particular one of the virtual channel specifiers is set to an unblock state in response to an unblock instruction specifying the particular one of the virtual channel specifiers.
1 Assignment
0 Petitions
Accused Products
Abstract
Techniques in advanced deep learning provide improvements in one or more of accuracy, performance, and energy efficiency. An array of processing elements performs flow based computations on wavelets of data. Each processing element has a compute element and a routing element. Each compute element has memory. Each router enables communication via wavelets with nearest neighbors in a 2D mesh. Routing is controlled by respective virtual channel specifiers in each wavelet and routing configuration information in each router. A compute element receives a particular wavelet comprising a particular virtual channel specifier and a particular data element. Instructions are read from the memory of the compute element based at least in part on the particular virtual channel specifier. The particular data element is used as an input operand to execute at least one of the instructions.
-
Citations
57 Claims
-
1. A method comprising:
-
sending a fabric packet by a sending processing element to a fabric, the fabric packet comprising a virtual channel specifier and a fabric packet payload; routing the fabric packet via the fabric from the sending processing element to a receiving processing element via zero or more routing processing elements, the routing in accordance with the virtual channel specifier; in the receiving processing element, receiving the fabric packet from the fabric, reading one or more instructions from a memory of the receiving processing element at an address based at least in part on the virtual channel specifier, and using at least a portion of the fabric packet payload as an input operand to execute at least one of the one or more instructions; wherein the virtual channel specifier is one of a plurality of virtual channel specifiers, each of the plurality of virtual channel specifiers is associated with a respective set of one or more sets of fabric packets, and the receiving comprises associating the fabric packet with the respective set associated with the virtual channel specifier; and wherein a block/unblock state is maintained for each of the virtual channel specifiers, and the block/unblock state of a particular one of the virtual channel specifiers is set to a block state in response to a block instruction specifying the particular one of the virtual channel specifiers and the block/unblock state of the particular one of the virtual channel specifiers is set to an unblock state in response to an unblock instruction specifying the particular one of the virtual channel specifiers. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method comprising:
-
sending a fabric packet by a sending processing element to a fabric, the fabric packet comprising a virtual channel specifier and a fabric packet payload; routing the fabric packet via the fabric from the sending processing element to a receiving processing element via zero or more routing processing elements, the routing in accordance with the virtual channel specifier; in the receiving processing element, receiving the fabric packet from the fabric, reading one or more instructions from a memory of the receiving processing element at an address based at least in part on the virtual channel specifier, and using at least a portion of the fabric packet payload as an input operand to execute at least one of the one or more instructions; and wherein the portion of the fabric packet payload comprises one or more data elements and the one or more data elements comprise at least a portion of one of;
a weight of a neural network, an activation of a neural network, a partial sum of activations of a neural network, an error of a neural network, a gradient estimate of a neural network, and a weight update of a neural network. - View Dependent Claims (10, 11)
-
-
12. A method comprising:
-
sending a fabric packet by a sending processing element to a fabric, the fabric packet comprising a virtual channel specifier and a fabric packet payload; routing the fabric packet via the fabric from the sending processing element to a receiving processing element via zero or more routing processing elements, the routing in accordance with the virtual channel specifier; in the receiving processing element, receiving the fabric packet from the fabric, reading one or more instructions from a memory of the receiving processing element at an address based at least in part on the virtual channel specifier, and using at least a portion of the fabric packet payload as an input operand to execute at least one of the one or more instructions; and wherein the one or more instructions implement at least a portion of one or more of;
computing an activation of a neural network, computing a partial sum of activations of a neural network, computing an error of a neural network, computing a gradient estimate of a neural network, and updating a weight of a neural network. - View Dependent Claims (13, 14, 15)
-
-
16. A method comprising:
-
sending a fabric packet by a sending processing element to a fabric, the fabric packet comprising a virtual channel specifier and a fabric packet payload; routing the fabric packet via the fabric from the sending processing element to a receiving processing element via zero or more routing processing elements, the routing in accordance with the virtual channel specifier; in the receiving processing element, receiving the fabric packet from the fabric, reading one or more instructions from a memory of the receiving processing element at an address based at least in part on the virtual channel specifier, and using at least a portion of the fabric packet payload as an input operand to execute at least one of the one or more instructions; and wherein a virtual channel associated with the virtual channel specifier is used for communicating at least one of control and data associated with one or more of;
computing an activation of a neural network, computing a partial sum of activations of a neural network, computing an error of a neural network, computing a gradient estimate of a neural network, and updating a weight of a neural network. - View Dependent Claims (17, 18, 19)
-
-
20. A system comprising:
-
means for sending a fabric packet by a sending processing element to a fabric, the fabric packet comprising a virtual channel specifier and a fabric packet payload; means for routing the fabric packet via the fabric from the sending processing element to a receiving processing element via zero or more routing processing elements, the means for routing operating in accordance with the virtual channel specifier; in the receiving processing element, means for receiving the fabric packet from the fabric, means for reading one or more instructions from a memory of the receiving processing element at an address based at least in part on the virtual channel specifier, and means for using at least a portion of the fabric packet payload as an input operand to execute at least one of the one or more instructions; wherein the virtual channel specifier is one of a plurality of virtual channel specifiers, each of the plurality of virtual channel specifiers associated with a respective set of one or more sets of fabric packets, and the means for receiving comprises means for associating the fabric packet with the respective set associated with the virtual channel specifier; and further comprising means for maintaining respective block/unblock state for each of the virtual channel specifiers; and
wherein the block/unblock state of a particular one of the virtual channel specifiers is set to a block state in response to a block instruction specifying the particular one of the virtual channel specifiers and wherein the block/unblock state of the particular one of the virtual channel specifiers is set to an unblock state in response to an unblock instruction specifying the particular one of the virtual channel specifiers. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27)
-
-
28. A system comprising:
-
means for sending a fabric packet by a sending processing element to a fabric, the fabric packet comprising a virtual channel specifier and a fabric packet payload; means for routing the fabric packet via the fabric from the sending processing element to a receiving processing element via zero or more routing processing elements, the means for routing operating in accordance with the virtual channel specifier; in the receiving processing element, means for receiving the fabric packet from the fabric, means for reading one or more instructions from a memory of the receiving processing element at an address based at least in part on the virtual channel specifier, and means for using at least a portion of the fabric packet payload as an input operand to execute at least one of the one or more instructions; and wherein the portion of the fabric packet payload comprises one or more data elements and the one or more data elements comprise at least a portion of one of;
a weight of a neural network, an activation of a neural network, a partial sum of activations of a neural network, an error of a neural network, a gradient estimate of a neural network, and a weight update of a neural network. - View Dependent Claims (29, 30)
-
-
31. A system comprising:
-
means for sending a fabric packet by a sending processing element to a fabric, the fabric packet comprising a virtual channel specifier and a fabric packet payload; means for routing the fabric packet via the fabric from the sending processing element to a receiving processing element via zero or more routing processing elements, the means for routing operating in accordance with the virtual channel specifier; in the receiving processing element, means for receiving the fabric packet from the fabric, means for reading one or more instructions from a memory of the receiving processing element at an address based at least in part on the virtual channel specifier, and means for using at least a portion of the fabric packet payload as an input operand to execute at least one of the one or more instructions; and wherein the one or more instructions implement at least a portion of one or more of;
computing an activation of a neural network, computing a partial sum of activations of a neural network, computing an error of a neural network, computing a gradient estimate of a neural network, and updating a weight of a neural network. - View Dependent Claims (32, 33, 34)
-
-
35. A system comprising:
-
means for sending a fabric packet by a sending processing element to a fabric, the fabric packet comprising a virtual channel specifier and a fabric packet payload; means for routing the fabric packet via the fabric from the sending processing element to a receiving processing element via zero or more routing processing elements, the means for routing operating in accordance with the virtual channel specifier; in the receiving processing element, means for receiving the fabric packet from the fabric, means for reading one or more instructions from a memory of the receiving processing element at an address based at least in part on the virtual channel specifier, and means for using at least a portion of the fabric packet payload as an input operand to execute at least one of the one or more instructions; and wherein a virtual channel associated with the virtual channel specifier is used for communicating at least one of control and data associated with one or more of;
computing an activation of a neural network, computing a partial sum of activations of a neural network, computing an error of a neural network, computing a gradient estimate of a neural network, and updating a weight of a neural network. - View Dependent Claims (36, 37, 38)
-
-
39. A system comprising:
-
a sending processing element, one or more routing elements, and a receiving processing element, all comprising respective couplings to a fabric, and the receiving processing element further comprising a memory, an instruction fetcher, a data sequencer, and an instruction execution unit; wherein the sending processing element is enabled to transmit a fabric packet comprising a virtual channel specifier and a fabric packet payload to the fabric via the coupling of the sending processing element; wherein the one or more routing elements are enabled to route the fabric packet in accordance with the virtual channel specifier to the receiving processing element via the couplings of the one or more routing elements; wherein the receiving processing element is enabled to receive the fabric packet via the coupling of the receiving processing element; wherein the memory is enabled to store one or more instructions at an address based at least in part on the virtual channel specifier; wherein the instruction fetcher is enabled to read the one or more instructions at the address; wherein the data sequencer is enabled to provide a portion of the fabric packet payload to the instruction execution unit; wherein the instruction execution unit is enabled to execute at least one of the one or more instructions; wherein the virtual channel specifier is one of a plurality of virtual channel specifiers, each of the plurality of virtual channel specifiers associated with a respective set of one or more sets of fabric packets, and the receiving comprises associating the fabric packet with the respective set associated with the virtual channel specifier; and further comprising respective block/unblock state maintained for each of the virtual channel specifiers; and
wherein the block/unblock state of a particular one of the virtual channel specifiers is set to a block state in response to a block instruction specifying the particular one of the virtual channel specifiers and wherein the block/unblock state of the particular one of the virtual channel specifiers is set to an unblock state in response to an unblock instruction specifying the particular one of the virtual channel specifiers. - View Dependent Claims (40, 41, 42, 43, 44, 45, 46)
-
-
47. A system comprising:
-
a sending processing element, one or more routing elements, and a receiving processing element, all comprising respective couplings to a fabric, and the receiving processing element further comprising a memory, an instruction fetcher, a data sequencer, and an instruction execution unit; wherein the sending processing element is enabled to transmit a fabric packet comprising a virtual channel specifier and a fabric packet payload to the fabric via the coupling of the sending processing element; wherein the one or more routing elements are enabled to route the fabric packet in accordance with the virtual channel specifier to the receiving processing element via the couplings of the one or more routing elements; wherein the receiving processing element is enabled to receive the fabric packet via the coupling of the receiving processing element; wherein the memory is enabled to store one or more instructions at an address based at least in part on the virtual channel specifier; wherein the instruction fetcher is enabled to read the one or more instructions at the address; wherein the data sequencer is enabled to provide a portion of the fabric packet payload to the instruction execution unit; wherein the instruction execution unit is enabled to execute at least one of the one or more instructions; and wherein the portion of the fabric packet payload comprises one or more data elements and the one or more data elements comprise at least a portion of one of;
a weight of a neural network, an activation of a neural network, a partial sum of activations of a neural network, an error of a neural network, a gradient estimate of a neural network, and a weight update of a neural network. - View Dependent Claims (48, 49)
-
-
50. A system comprising:
-
a sending processing element, one or more routing elements, and a receiving processing element, all comprising respective couplings to a fabric, and the receiving processing element further comprising a memory, an instruction fetcher, a data sequencer, and an instruction execution unit; wherein the sending processing element is enabled to transmit a fabric packet comprising a virtual channel specifier and a fabric packet payload to the fabric via the coupling of the sending processing element; wherein the one or more routing elements are enabled to route the fabric packet in accordance with the virtual channel specifier to the receiving processing element via the couplings of the one or more routing elements; wherein the receiving processing element is enabled to receive the fabric packet via the coupling of the receiving processing element; wherein the memory is enabled to store one or more instructions at an address based at least in part on the virtual channel specifier; wherein the instruction fetcher is enabled to read the one or more instructions at the address; wherein the data sequencer is enabled to provide a portion of the fabric packet payload to the instruction execution unit; wherein the instruction execution unit is enabled to execute at least one of the one or more instructions; and wherein the one or more instructions implement at least a portion of one or more of;
computing an activation of a neural network, computing a partial sum of activations of a neural network, computing an error of a neural network, computing a gradient estimate of a neural network, and updating a weight of a neural network. - View Dependent Claims (51, 52)
-
-
53. A system comprising:
-
a sending processing element, one or more routing elements, and a receiving processing element, all comprising respective couplings to a fabric, and the receiving processing element further comprising a memory, an instruction fetcher, a data sequencer, and an instruction execution unit; wherein the sending processing element is enabled to transmit a fabric packet comprising a virtual channel specifier and a fabric packet payload to the fabric via the coupling of the sending processing element; wherein the one or more routing elements are enabled to route the fabric packet in accordance with the virtual channel specifier to the receiving processing element via the couplings of the one or more routing elements; wherein the receiving processing element is enabled to receive the fabric packet via the coupling of the receiving processing element; wherein the memory is enabled to store one or more instructions at an address based at least in part on the virtual channel specifier; wherein the instruction fetcher is enabled to read the one or more instructions at the address; wherein the data sequencer is enabled to provide a portion of the fabric packet payload to the instruction execution unit; wherein the instruction execution unit is enabled to execute at least one of the one or more instructions; and wherein a virtual channel associated with the virtual channel specifier is used for communicating at least one of control and data associated with one or more of;
computing an activation of a neural network, computing a partial sum of activations of a neural network, computing an error of a neural network, computing a gradient estimate of a neural network, and updating a weight of a neural network. - View Dependent Claims (54, 55, 56, 57)
-
Specification