NEURAL NETWORK ACCELERATOR
First Claim
Patent Images
1. A method for implementing a neural network, the method comprising:
- receiving input data;
fetching, from a memory, weights of the neural network;
performing a first portion of processing for the neural network, the first portion implemented in hardware by an accelerator and including a plurality of parallel multiply and accumulate (MAC) operations; and
performing a second portion of processing for the neural network, the second portion implemented in software by a processor, the accelerator and the processor using a bus to communicate and to share access to the memory.
2 Assignments
0 Petitions
Accused Products
Abstract
A neural network implementation is disclosed. The implementation allows the computations for the neural network to be performed on either an accelerator or a processor. The accelerator and the processor share a memory and communicate over a bus to perform the computations and to share data. The implementation uses weight compression and pruning, as well as parallel processing, to reduce computing, storage, and power requirements.
-
Citations
22 Claims
-
1. A method for implementing a neural network, the method comprising:
-
receiving input data; fetching, from a memory, weights of the neural network; performing a first portion of processing for the neural network, the first portion implemented in hardware by an accelerator and including a plurality of parallel multiply and accumulate (MAC) operations; and performing a second portion of processing for the neural network, the second portion implemented in software by a processor, the accelerator and the processor using a bus to communicate and to share access to the memory. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A neural network system comprising:
-
a memory configured to store compressed weights of a neural network; a processor; a processor data bus coupled between the processor and the memory; and an accelerator coupled to and sharing the processor data bus with the processor, wherein the accelerator is configured to; fetch and decompress the compressed weights of the neural network from the memory; and perform at least a portion of processing for the neural network while the processor performs other tasks, the at least a portion of the processing include a plurality of multiply and accumulate (MAC) operations that operate in parallel. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. An accelerator for implementing a neural network, the accelerator comprising:
-
a plurality of multiply and accumulate (MAC) units operating in parallel, each MAC unit configured to repetitively multiply an input value and a weight to accumulate a full sum of products representing a value corresponding to neuron in the neural network; a lookup table for decompressing compressed weights stored in a memory to produce the weight for each MAC unit at each repetition; and a circular buffer that feeds the input value to each MAC unit at each repetition. - View Dependent Claims (22)
-
Specification