NEURAL NETWORK ACCELERATOR

US 20190340493A1
Filed: 04/16/2019
Published: 11/07/2019
Est. Priority Date: 05/01/2018
Status: Active Grant

First Claim

Patent Images

1. A method for implementing a neural network, the method comprising:

receiving input data;

fetching, from a memory, weights of the neural network;

performing a first portion of processing for the neural network, the first portion implemented in hardware by an accelerator and including a plurality of parallel multiply and accumulate (MAC) operations; and

performing a second portion of processing for the neural network, the second portion implemented in software by a processor, the accelerator and the processor using a bus to communicate and to share access to the memory.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A neural network implementation is disclosed. The implementation allows the computations for the neural network to be performed on either an accelerator or a processor. The accelerator and the processor share a memory and communicate over a bus to perform the computations and to share data. The implementation uses weight compression and pruning, as well as parallel processing, to reduce computing, storage, and power requirements.

Citations

22 Claims

1. A method for implementing a neural network, the method comprising:
- receiving input data;
  
  fetching, from a memory, weights of the neural network;
  
  performing a first portion of processing for the neural network, the first portion implemented in hardware by an accelerator and including a plurality of parallel multiply and accumulate (MAC) operations; and
  
  performing a second portion of processing for the neural network, the second portion implemented in software by a processor, the accelerator and the processor using a bus to communicate and to share access to the memory.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method according to claim 1, wherein the weights include compressed weights.
  - 3. The method according to claim 2, wherein the first portion of processing includes decompressing the compressed weights using a lookup table.
  - 4. The method according to claim 2, wherein the weights include pruned weights and unpruned weights, the pruned weights including single bit representations of synapses and the unpruned weights including an additional bit.
  - 5. The method according to claim 1, wherein the plurality of parallel multiply and accumulate operations are performed by a plurality of MAC units operating in parallel and repetitively to produce a sum of products corresponding each neuron in the neural network.
  - 6. The method according to claim 5, wherein the first portion includes adding a bias to each sum of products for each neuron in the neural network.
  - 7. The method according to claim 5, wherein the second portion includes adding a bias to each sum of products for each neuron in the neural network.
  - 8. The method according to claim 5, wherein the first portion includes applying an activation function to each sum of products with bias added for each neuron in the neural network.
  - 9. The method according to claim 5, wherein the second portion includes applying an activation function to each sum of products with bias added for each neuron in the neural network.

10. A neural network system comprising:
- a memory configured to store compressed weights of a neural network;
  
  a processor;
  
  a processor data bus coupled between the processor and the memory; and
  
  an accelerator coupled to and sharing the processor data bus with the processor, wherein the accelerator is configured to;
  
  fetch and decompress the compressed weights of the neural network from the memory; and
  
  perform at least a portion of processing for the neural network while the processor performs other tasks, the at least a portion of the processing include a plurality of multiply and accumulate (MAC) operations that operate in parallel.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
- - 11. The neural network system according to claim 10, wherein:
    - the compressed weights that are stored in the memory are each 4 bits; and
      
      the decompressed weights used in the at least a portion of the processing of the neural network performed by the accelerator are each 8 bits.
  - 12. The neural network system according to claim 10, wherein the accelerator includes a lookup table to decompress the compressed weights.
  - 13. The neural network system according to claim 12, wherein the lookup table is implemented in software running on the processor and loaded into the accelerator.
  - 14. The neural network system according to claim 12, wherein the lookup table is implemented in hardware on the accelerator.
  - 15. The neural network system according to claim 10, wherein the compressed weights stored in the memory are pruned so that zero value weights are encoded as a single zero bit and non-zero value weights are encoded with a header bit of one.
  - 16. The neural network system according to claim 10, wherein a plurality of MAC units are implemented in hardware on the accelerator and operate in parallel to perform the plurality of multiply and accumulate (MAC) operations at the same time.
  - 17. The neural network system according to claim 16, wherein the accelerator includes a circular buffer coupled to the MAC units and receiving input data for the neural network.
  - 18. The neural network system according to claim 17, wherein the circular buffer includes bins that each store a frame of input data until the frame is coupled to one of the plurality of MAC units, at which point the bin receives a new frame of input data.
  - 19. The neural network system according to claim 10, wherein the at least a portion of the processing of the neural network includes applying an activation function to an accumulated sum for a neuron resulting from the plurality of MAC operations, the activation function implemented in hardware on the accelerator.
  - 20. The neural network system according to claim 10, wherein the at least a portion of the processing of the neural network includes applying a bias to an accumulated sum for a neuron resulting from the plurality of MAC operations, the bias fetched from the memory.

21. An accelerator for implementing a neural network, the accelerator comprising:
- a plurality of multiply and accumulate (MAC) units operating in parallel, each MAC unit configured to repetitively multiply an input value and a weight to accumulate a full sum of products representing a value corresponding to neuron in the neural network;
  
  a lookup table for decompressing compressed weights stored in a memory to produce the weight for each MAC unit at each repetition; and
  
  a circular buffer that feeds the input value to each MAC unit at each repetition.
- View Dependent Claims (22)
- - 22. The accelerator for implementing a neural network according to claim 21, wherein the accelerator for implementing a neural network is included in a hearing aid.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Semiconductor Components Industries LLC (ON Semiconductor Corporation)
Original Assignee
Semiconductor Components Industries LLC (ON Semiconductor Corporation)
Inventors
COENEN, Ivo Leonardus, MITCHLER, Dennis Wayne

Granted Patent

US 11,687,759 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06N 3/04   Architecture, e.g. intercon...

G06N 3/063   using electronic means

G06N 3/08   Learning methods

G06N 3/082   modifying the architecture,...

NEURAL NETWORK ACCELERATOR

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

NEURAL NETWORK ACCELERATOR

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links