Efficient data layouts for convolutional neural networks
First Claim
Patent Images
1. A system for executing a convolutional neural network (CNN), the system comprising:
- non-transitory memory configured to store;
a convolutional layer of a convolutional neural network,wherein the convolutional layer comprises kernels in a kernel stack,wherein the kernels of the kernel stack are in a basic kernel layout,wherein weight values of the kernels of the kernel stack are reordered from the basic kernel layout into a tile kernel layout comprising a plurality of kernel tiles,wherein a kernel tile comprises a plurality of kernel runnels, andwherein a kernel runnel comprises a number of the weight values of the kernels of the kernel stack; and
a hardware processor in communication with the non-transitory memory, the hardware processor programmed by executable instructions to;
receive input activation maps of the convolutional layer, wherein the input activation maps are in a basic input activation map layout;
reorder pixel values of the input activation maps from the basic input activation map layout into an interleaved input activation map layout comprising a plurality of clusters of input activation map pixels; and
determine output activation maps of the convolutional layer from the plurality of kernel tiles and the plurality of clusters of input activation map pixels,wherein the output activation maps are in an interleaved output activation map layout comprising a plurality of clusters output activation map pixels.
3 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for efficient implementation of a convolutional layer of a convolutional neural network are disclosed. In one aspect, weight values of kernels in a kernel stack of a convolutional layer can be reordered into a tile layout with tiles of runnels. Pixel values of input activation maps of the convolutional layer can be reordered into an interleaved layout comprising a plurality of clusters of input activation map pixels. The output activation maps can be determined using the clusters of the input activation map pixels and kernels tile by tile.
-
Citations
20 Claims
-
1. A system for executing a convolutional neural network (CNN), the system comprising:
-
non-transitory memory configured to store; a convolutional layer of a convolutional neural network, wherein the convolutional layer comprises kernels in a kernel stack, wherein the kernels of the kernel stack are in a basic kernel layout, wherein weight values of the kernels of the kernel stack are reordered from the basic kernel layout into a tile kernel layout comprising a plurality of kernel tiles, wherein a kernel tile comprises a plurality of kernel runnels, and wherein a kernel runnel comprises a number of the weight values of the kernels of the kernel stack; and a hardware processor in communication with the non-transitory memory, the hardware processor programmed by executable instructions to; receive input activation maps of the convolutional layer, wherein the input activation maps are in a basic input activation map layout; reorder pixel values of the input activation maps from the basic input activation map layout into an interleaved input activation map layout comprising a plurality of clusters of input activation map pixels; and determine output activation maps of the convolutional layer from the plurality of kernel tiles and the plurality of clusters of input activation map pixels, wherein the output activation maps are in an interleaved output activation map layout comprising a plurality of clusters output activation map pixels. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification