Method to Map Convolutional Layers of Deep Neural Network on a Plurality of Processing Elements with SIMD Execution Units, Private Memories, and Connected as a 2D Systolic Processor Array
First Claim
1. A method for improving performance of a predefined Deep Neural Network (DNN) convolution processing on a computing device, the method comprising:
- inputting parameters as input data into a processor on a computer that formalizes a design space exploration of a convolution mapping on a predefined computer architecture that will execute the predefined convolution processing, wherein the parameters are predefined as guided by a specification for the predefined convolution processing to be implemented by the convolution mapping and by a microarchitectural specification for the processor that will execute the predefined convolution processing; and
calculating, by the processor, performance metrics for executing the predefined convolution processing on the computing device, as functions of the predefined parameters, as proxy estimates of performance of different possible design choices to implement the predefined convolution processing.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for improving performance of a predefined Deep Neural Network (DNN) convolution processing on a computing device includes inputting parameters, as input data into a processor on a computer that formalizes a design space exploration of a convolution mapping, on a predefined computer architecture that will execute the predefined convolution processing. The parameters are predefined as guided by a specification for the predefined convolution processing to be implemented by the convolution mapping and by a microarchitectural specification for the processor that will execute the predefined convolution processing. The processor calculates performance metrics for executing the predefined convolution processing on the computing device, as functions of the predefined parameters, as proxy estimates of performance of different possible design choices to implement the predefined convolution processing.
-
Citations
20 Claims
-
1. A method for improving performance of a predefined Deep Neural Network (DNN) convolution processing on a computing device, the method comprising:
-
inputting parameters as input data into a processor on a computer that formalizes a design space exploration of a convolution mapping on a predefined computer architecture that will execute the predefined convolution processing, wherein the parameters are predefined as guided by a specification for the predefined convolution processing to be implemented by the convolution mapping and by a microarchitectural specification for the processor that will execute the predefined convolution processing; and calculating, by the processor, performance metrics for executing the predefined convolution processing on the computing device, as functions of the predefined parameters, as proxy estimates of performance of different possible design choices to implement the predefined convolution processing. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for exploring a design space for mapping convolutional layers of a Deep Neural Network (DNN) onto a plurality of processing elements connected as a 2-dimensional (2D) systolic processor array, the method comprising:
-
inputting parameter values into a processor on a computer from a microarchitecture specification that defines configuration aspects of the processing elements; inputting parameter values into the processor from a specification that defines a convolutional processing; and calculating, by the processor, performance metrics for executing the convolution processing on the 2D systolic processor array, as functions of the predefined parameters, as proxy estimates of performance of different possible design choices to implement the predefined convolution processing. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. An apparatus, comprising:
-
a processor; and a memory device accessible by the processor, the memory device storing a set of instructions that permit the processor to execute a method of optimizing a mapping of convolutional layers of a Deep Neural Network (DNN) onto a plurality of processing elements connected as a 2-dimensionsl (2D) systolic processor array, the method comprising; inputting parameter values into a processor on a computer from a microarchitecture specification that defines configuration aspects of the processing elements; inputting parameter values into the processor from a specification that defines a convolution processing; calculating, by the processor, performance metrics for executing the convolution processing on the 2D systolic processor array, as functions of the predefined parameters, as proxy estimates of performance of different possible design choices to implement the convolution processing; inputting one or more constraints that permit the processor to eliminate invalid design choices; and determining an optimal mapping onto the 3D systolic processor array for the convolution processing. - View Dependent Claims (20)
-
Specification