Approximating fully-connected layers with multiple arrays of 3x3 convolutional filter kernels in a CNN based integrated circuit
First Claim
1. A digital integrated circuit comprising:
- a plurality of cellular neural networks (CNN) processing engines operatively coupled to at least one input/output data bus, the plurality of CNN processing engines being connected in a loop with a clock-skew circuit, each CNN processing engine comprising;
a CNN processing block configured for simultaneously performing convolutional operations using input data and pre-trained filter coefficients of a plurality of ordered convolutional layers, and further configured for classifying the input data using a plurality of 3×
3 filter kernels to approximate operations of fully-connected (FC) layers, wherein output of the plurality of ordered convolutional layers has P feature maps with F×
F pixels of data per feature map and the plurality of 3×
3 filter kernels comprises L layers with each of the L layers organized in an array of R×
Q of 3×
3 filter kernels, wherein Q and R are respective numbers of input and output feature maps of a particular layer of the L layers, wherein L is equal to (F−
1)/2 when F is an odd number, and wherein P, F, Q and R are positive integers;
a first set of memory buffers operatively coupling to the CNN processing block for storing the input data; and
a second set of memory buffers operative coupling to the CNN processing block for storing the pre-trained filter coefficients.
1 Assignment
0 Petitions
Accused Products
Abstract
Multiple 3×3 convolutional filter kernels are used for approximating operations of fully-connected (FC) layers. Image classification task is entirely performed within a CNN based integrated circuit. Output at the end of ordered convolutional layers contains P feature maps with F×F pixels of data per feature map. 3×3 filter kernels comprises L layers with each organized in an array of R×Q of 3×3 filter kernels, Q and R are respective numbers of input and output feature maps of a particular layer of the L layers. Each input feature map of the particular layer comprises F×F pixels of data with one-pixel padding added around its perimeter. Each output feature map of the particular layer comprises (F−2)×(F−2) pixels of useful data. Output of the last layer of the L layers contains Z classes. L equals to (F−1)/2 if F is an odd number. P, F, Q, R and Z are positive integers.
102 Citations
10 Claims
-
1. A digital integrated circuit comprising:
- a plurality of cellular neural networks (CNN) processing engines operatively coupled to at least one input/output data bus, the plurality of CNN processing engines being connected in a loop with a clock-skew circuit, each CNN processing engine comprising;
a CNN processing block configured for simultaneously performing convolutional operations using input data and pre-trained filter coefficients of a plurality of ordered convolutional layers, and further configured for classifying the input data using a plurality of 3×
3 filter kernels to approximate operations of fully-connected (FC) layers, wherein output of the plurality of ordered convolutional layers has P feature maps with F×
F pixels of data per feature map and the plurality of 3×
3 filter kernels comprises L layers with each of the L layers organized in an array of R×
Q of 3×
3 filter kernels, wherein Q and R are respective numbers of input and output feature maps of a particular layer of the L layers, wherein L is equal to (F−
1)/2 when F is an odd number, and wherein P, F, Q and R are positive integers;
a first set of memory buffers operatively coupling to the CNN processing block for storing the input data; and
a second set of memory buffers operative coupling to the CNN processing block for storing the pre-trained filter coefficients. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- a plurality of cellular neural networks (CNN) processing engines operatively coupled to at least one input/output data bus, the plurality of CNN processing engines being connected in a loop with a clock-skew circuit, each CNN processing engine comprising;
Specification