Data-optimized neural network traversal
First Claim
Patent Images
1. A method of executing a neural network by a processor that includes a neural network engine circuit having an internal memory, comprising:
- generating a first output tile of a first layer of the neural network by processing an input tile to the first layer, a size of the first output tile is based at least in part on a size of the internal memory;
storing the first output tile of the first layer in the internal memory; and
processing, by the neural network engine circuit, and using the first output tile, a plurality of adjacent layers of the neural network, wherein the plurality of adjacent layers is grouped and partitioned into one or more frustums and a rectangular intersection of the frustums determines at least in part the size of the first output tile;
generating, using the neural network engine circuit, a second output tile of a second output layer of the frustum of the neural network by processing the first output tile of the first layer stored in the internal memory.
1 Assignment
0 Petitions
Accused Products
Abstract
Executing a neural network includes generating an output tile of a first layer of the neural network by processing an input tile to the first layer and storing the output tile of the first layer in an internal memory of a processor. An output tile of a second layer of the neural network can be generated using the processor by processing the output tile of the first layer stored in the internal memory.
-
Citations
20 Claims
-
1. A method of executing a neural network by a processor that includes a neural network engine circuit having an internal memory, comprising:
-
generating a first output tile of a first layer of the neural network by processing an input tile to the first layer, a size of the first output tile is based at least in part on a size of the internal memory; storing the first output tile of the first layer in the internal memory; and processing, by the neural network engine circuit, and using the first output tile, a plurality of adjacent layers of the neural network, wherein the plurality of adjacent layers is grouped and partitioned into one or more frustums and a rectangular intersection of the frustums determines at least in part the size of the first output tile; generating, using the neural network engine circuit, a second output tile of a second output layer of the frustum of the neural network by processing the first output tile of the first layer stored in the internal memory. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. An apparatus comprising a processor configured to execute a neural network, the processor comprising:
-
an internal memory; a first compute unit coupled to the internal memory and configured to perform executable operations including; generating a first output tile of a first layer of the neural network by processing an input tile to the first layer, wherein a size of the first output tile is determined at least in part by a size of the internal memory; storing the first output tile of the first layer in an internal memory of a processor; and processing, by the processor and using the first output tile, a plurality of adjacent layers of the neural network, wherein the plurality of adjacent layers is grouped and partitioned into one or more frustums and wherein a rectangular intersection of the frustums determines at least in part the size of the first output tile; generating, using the processor, a second output tile of a second output layer of the frustum of the neural network by processing the first output tile of the first layer stored in the internal memory. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product comprising a computer readable storage medium having program code stored thereon for executing a neural network, the program code executable by a processor to perform operations comprising:
-
generating a first output tile of a first layer of the neural network by processing an input tile to the first layer, wherein a size of the first output tile is determined at least in part by a size of the internal memory; storing the first output tile of the first layer in an internal memory of a processor; and processing, by the processor and using the first output tile, a plurality of adjacent layers of the neural network, wherein the plurality of adjacent layers is grouped and partitioned into at least one frustum and a rectangular intersection of the frustums determines at least in part the size of the first output tile; generating, using the processor, a second output tile of a second output layer of the frustum of the neural network by processing the first output tile of the first layer stored in the internal memory. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification