×

NEURAL NETWORK UNIT THAT PERFORMS EFFICIENT 3-DIMENSIONAL CONVOLUTIONS

  • US 20180157966A1
  • Filed: 12/01/2016
  • Published: 06/07/2018
  • Est. Priority Date: 12/01/2016
  • Status: Active Grant
First Claim
Patent Images

1. A neural network unit (NNU) configured to convolve an input of H rows by W columns by C channels with F filters each of R rows by S columns by C channels to generate F outputs each of Q rows by P columns, the neural network unit comprising:

  • at least one memory that outputs a row of N words, wherein N is at least 512;

    an array of N processing units (PU), wherein each PU of the array has an accumulator, a register configured to receive a respective word of the N words from a row of the at least one memory, a multiplexed-register configured to selectively receive a respective word of the N words from a row of the at least one memory or a word rotated from the multiplexed-register of a logically adjacent PU, and an arithmetic logic unit coupled to the accumulator, register and multiplexed-register;

    wherein the N PUs are logically partitioned as G blocks each of B respective PUs, wherein B is a smallest factor of N that is at least as great as W;

    for each output row of the Q output rows;

    for each filter row of the R filter rows;

    the NNU reads into the N multiplexed-registers from the at least one memory a row of N words logically partitioned as G input blocks corresponding to the G blocks of PUs, wherein at least C of the G input blocks include a row of a respective channel of the C channels of the input; and

    for at least each channel of the C channels;

    for each filter column of the S filter columns;



    the NNU reads into the N registers from the at least one memory a row of N words logically partitioned as G filter blocks corresponding to the G input blocks, wherein each of F filter blocks of the G filter blocks corresponds to a respective filter of the F filters and comprises at least Q copies of a weight of the respective filter at the filter column and the filter row and the respective channel of the corresponding input block;



    each PU of the array multiplies the register and the multiplexed-register to generate a product and accumulates the product with the accumulator; and



    the NNU rotates the multiplexed-registers by one; and

    the NNU rotates the multiplexed-registers to align the G input blocks with the adjacent G blocks of B PUs; and

    the NNU writes the N accumulators to the at least one memory.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×