NEURAL HARDWARE ACCELERATOR FOR PARALLEL AND DISTRIBUTED TENSOR COMPUTATIONS
First Claim
1. A system comprising:
- a plurality of neural cores, each of the plurality of neural cores comprising at least one memory; and
a network interconnecting the plurality of neural cores, wherein;
the at least one memory of each of the plurality of neural cores comprises at least a portion of a weight tensor, the weight tensor comprising a plurality of filters, each neural core is adapted to retrieve locally or receive a portion of an input data tensor, apply the portion of the weight tensor thereto, and store locally or send a result therefrom via the network to other of the plurality of neural cores.
1 Assignment
0 Petitions
Accused Products
Abstract
Networks and encodings therefor are provided that are adapted to provide increased energy efficiency and speed for convolutional operations. In various embodiments, a neural network comprises a plurality of neural cores. Each of the plurality of neural cores comprises a memory. A network interconnects the plurality of neural cores. The memory of each of the plurality of neural cores comprises at least a portion of a weight tensor. The weight tensor comprising a plurality of weights. Each neural core is adapted to retrieve locally or receive a portion of an input image, apply the portion of the weight tensor thereto, and store locally or send a result therefrom via the network to other of the plurality of neural cores.
-
Citations
32 Claims
-
1. A system comprising:
-
a plurality of neural cores, each of the plurality of neural cores comprising at least one memory; and a network interconnecting the plurality of neural cores, wherein; the at least one memory of each of the plurality of neural cores comprises at least a portion of a weight tensor, the weight tensor comprising a plurality of filters, each neural core is adapted to retrieve locally or receive a portion of an input data tensor, apply the portion of the weight tensor thereto, and store locally or send a result therefrom via the network to other of the plurality of neural cores. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
-
-
26. A method comprising:
-
retrieving locally or receiving a portion of an input data tensor at a neural core, the neural core comprising a memory; reading from the memory at least a portion of a weight tensor, the weight tensor comprising a plurality of filters; applying the portion of the weight tensor to the portion of the input data tensor to obtain a result; storing locally or sending the result via a network to at least one other neural core. - View Dependent Claims (27, 28, 29, 30, 31)
-
-
32. A system comprising:
-
a plurality of neural cores, each of the plurality of neural cores comprising a memory; a network interconnecting the plurality of neural cores; a computing node comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor of the computing node to cause the processor to perform a method comprising; encoding a plurality of filters in a weight tensor; and providing at least a portion of the weight tensor to each of the plurality of neural cores; wherein each of the plurality of neural cores is adapted to; store in its memory at least a portion of the weight tensor; retrieve locally or receive a portion of an input data tensor; apply the portion of the weight tensor to the portion of the input data tensor to obtain a result; and store locally or send the result via a network to at least one other neural core
-
Specification