NEURAL HARDWARE ACCELERATOR FOR PARALLEL AND DISTRIBUTED TENSOR COMPUTATIONS

US 20190332925A1
Filed: 04/30/2018
Published: 10/31/2019
Est. Priority Date: 04/30/2018
Status: Active Application

First Claim

Patent Images

1. A system comprising:

a plurality of neural cores, each of the plurality of neural cores comprising at least one memory; and

a network interconnecting the plurality of neural cores, wherein;

the at least one memory of each of the plurality of neural cores comprises at least a portion of a weight tensor, the weight tensor comprising a plurality of filters, each neural core is adapted to retrieve locally or receive a portion of an input data tensor, apply the portion of the weight tensor thereto, and store locally or send a result therefrom via the network to other of the plurality of neural cores.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Networks and encodings therefor are provided that are adapted to provide increased energy efficiency and speed for convolutional operations. In various embodiments, a neural network comprises a plurality of neural cores. Each of the plurality of neural cores comprises a memory. A network interconnects the plurality of neural cores. The memory of each of the plurality of neural cores comprises at least a portion of a weight tensor. The weight tensor comprising a plurality of weights. Each neural core is adapted to retrieve locally or receive a portion of an input image, apply the portion of the weight tensor thereto, and store locally or send a result therefrom via the network to other of the plurality of neural cores.

Citations

32 Claims

1. A system comprising:
- a plurality of neural cores, each of the plurality of neural cores comprising at least one memory; and
  
  a network interconnecting the plurality of neural cores, wherein;
  
  the at least one memory of each of the plurality of neural cores comprises at least a portion of a weight tensor, the weight tensor comprising a plurality of filters, each neural core is adapted to retrieve locally or receive a portion of an input data tensor, apply the portion of the weight tensor thereto, and store locally or send a result therefrom via the network to other of the plurality of neural cores.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25)
- - 2. The system of claim 1, wherein applying the portion of the weight tensor to the portion of the input data tensor comprises computing a vector-matrix multiplication followed by an activation function.
  - 3. The system of claim 1, wherein the plurality of neural cores are arranged in a two-dimensional array.
  - 4. The system of claim 1, wherein the network provides local connectivity among the plurality of neural cores.
  - 5. The system of claim 3, wherein the network provides connectivity among neighboring neural cores within the array.
  - 6. The system of claim 1, wherein the portion of the weight tensor at each of the plurality of neural cores comprises a block of the weight tensor.
  - 7. The system of claim 3, wherein the weight tensor is distributed among the plurality of neural cores according to the physical dimensions of the array.
  - 8. The system of claim 7, wherein the dimensions, shape, and partitioning of the weight tensor are configurable.
  - 9. The system of claim 7, wherein the weight tensor is partitioned along one or more dimensions exhibiting locality.
  - 10. The system of claim 9, wherein the one or more dimensions comprise spatial dimensions.
  - 11. The system of claim 9, wherein the one or more dimensions comprise at least one feature dimension.
  - 12. The system of claim 1, wherein the weight tensor is sparse.
  - 13. The system of claim 12, wherein during said applying the portion of the weight tensor to the portion of the input data tensor, only non-zero values of the weight tensor are used for computation.
  - 14. The system of claim 12, wherein the memory stores only non-zero values of the weight tensor.
  - 15. The system of claim 1, wherein the weight tensor corresponds to a convolution filter.
  - 16. The system of claim 15, wherein the weight tensor is partitioned along one or more spatial dimension.
  - 17. The system of claim 1, wherein the result is an intermediate result, and wherein:
    - each neural core is further adapted to store locally or send the intermediate result via the network to other of the plurality of neural cores;
      
      each neural core is further adapted to retrieve locally or receive the intermediate result and compute a final result therefrom.
  - 18. The system of claim 1, wherein the portion of the weight tensor at each of the plurality of neural cores comprises a block of the weight tensor, and wherein a subset of the plurality of neural cores include the same replicated block.
  - 19. The system of claim 1, wherein one of the plurality of neural cores includes the whole weight tensor.
  - 20. The system of claim 19, wherein the one neural core is adapted to distribute the portions of the weight tensor to each other of the plurality of neural cores.
  - 21. The system of claim 1, further comprising:
    - a central memory comprising the whole weight tensor, whereinthe system is adapted to distribute the portions of the weight tensor to each other of the plurality of neural cores from the central memory.
  - 22. The system of claim 1, wherein the portion of the weight tensor corresponds to a subset of the plurality of filters.
  - 23. The system of claim 1, wherein the weight tensor is compressed.
  - 24. The system of claim 1, wherein the weight tensor is encoded to exclude zero values.
  - 25. The system of claim 1, wherein each value of the weight tensor is −
    - 1, 0, or 1.

26. A method comprising:
- retrieving locally or receiving a portion of an input data tensor at a neural core, the neural core comprising a memory;
  
  reading from the memory at least a portion of a weight tensor, the weight tensor comprising a plurality of filters;
  
  applying the portion of the weight tensor to the portion of the input data tensor to obtain a result;
  
  storing locally or sending the result via a network to at least one other neural core.
- View Dependent Claims (27, 28, 29, 30, 31)
- - 27. The method of claim 26, wherein the weight tensor is sparse.
  - 28. The method of claim 26, wherein the portion of the weight tensor corresponds to a subset of the plurality of filters.
  - 29. The method of claim 26, wherein the weight tensor is compressed.
  - 30. The method of claim 26, wherein the weight tensor is encoded to exclude zero values.
  - 31. The method of claim 26, wherein each value of the weight tensor is −
    - 1, 0, or 1.

32. A system comprising:
- a plurality of neural cores, each of the plurality of neural cores comprising a memory;
  
  a network interconnecting the plurality of neural cores;
  
  a computing node comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor of the computing node to cause the processor to perform a method comprising;
  
  encoding a plurality of filters in a weight tensor; and
  
  providing at least a portion of the weight tensor to each of the plurality of neural cores;
  
  wherein each of the plurality of neural cores is adapted to;
  
  store in its memory at least a portion of the weight tensor;
  
  retrieve locally or receive a portion of an input data tensor;
  
  apply the portion of the weight tensor to the portion of the input data tensor to obtain a result; and
  
  store locally or send the result via a network to at least one other neural core

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Modha, Dharmendra S.

Application Number

US15/967,482
Publication Number

US 20190332925A1
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 17/153   Multidimensional correlatio...

G06N 3/04   Architecture, e.g. intercon...

G06N 3/045   Combinations of networks

G06N 3/048   Activation functions

G06N 3/063   using electronic means

NEURAL HARDWARE ACCELERATOR FOR PARALLEL AND DISTRIBUTED TENSOR COMPUTATIONS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

NEURAL HARDWARE ACCELERATOR FOR PARALLEL AND DISTRIBUTED TENSOR COMPUTATIONS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links