×

NEURAL NETWORK UNIT WITH MEMORY LAYOUT TO PERFORM EFFICIENT 3-DIMENSIONAL CONVOLUTIONS

  • US 20180157962A1
  • Filed: 12/01/2016
  • Published: 06/07/2018
  • Est. Priority Date: 12/01/2016
  • Status: Active Grant
First Claim
Patent Images

1. A neural network unit (NNU) configured to convolve an input of H rows by W columns by C channels with F filters each of R rows by S columns by C channels to generate F outputs each of Q rows by P columns, the neural network unit comprising:

  • a first memory configured to hold rows of N words logically partitioned as G input blocks of B words each;

    a second memory configured to hold rows of N words logically partitioned as G filter blocks of B words each;

    wherein B is the smallest factor of N that is greater than W, and wherein N is at least 512;

    an array of N processing units (PU), wherein each PU of the array has an accumulator, a register configured to receive a respective word of the N words from a row of the second memory, a multiplexed-register configured to selectively receive a respective word of the N words from a row of the first memory or a word rotated from the multiplexed-register of a logically adjacent PU, and an arithmetic logic unit coupled to the accumulator, register and multiplexed-register, wherein the N PUs are logically partitioned as G PU blocks of B PUs each;

    wherein the input blocks are held in H rows of the first memory, wherein each row of the H rows of the first memory holds a respective 2-dimensional slice of a corresponding row of the H rows of the input, wherein the respective 2-dimensional slice is held within at least C input blocks of the G input blocks, wherein each input block of the at least C input blocks holds a row of words of the 2-dimensional slice specified by a respective channel of the C channels;

    wherein the filter blocks are held in R×



    C rows of the second memory, wherein each filter block of F of the G filter blocks of each row of the R×



    C rows of the second memory holds P copies of a weight of a corresponding filter of the F filters at a respective row and a respective column and a respective channel of the corresponding filter; and

    wherein to convolve the input with the filters, the G PU blocks perform multiply-accumulate operations on the input blocks and filter blocks in a column-channel-row order, wherein the G PU blocks read a row of the H rows of the at least C input blocks from the first memory and rotate the row around the N PUs while performing a portion of the multiply-accumulate operations such that each of F of the G PU blocks receives each of the at least C input blocks of the row before reading another row of the H rows from the first memory.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×