NEURAL NETWORK PROCESSOR BASED ON APPLICATION SPECIFIC SYNTHESIS SPECIALIZATION PARAMETERS
First Claim
1. A method, implemented by a processor, for synthesizing a neural network processor comprising a plurality of tile engines, wherein each of the plurality of tile engines is configured to process matrix elements and vector elements, the method comprising:
- using the processor analyzing a neural network model corresponding to an application to determine;
(1) a first minimum number of units required to express a shared exponent value required to satisfy a first precision requirement corresponding to each of the matrix elements and corresponding to each of the vector elements, (2) a second minimum number of units required to express a first mantissa value required to satisfy a second precision requirement corresponding to the each of the matrix elements, and (3) a third minimum number of units required to express a second mantissa value required to satisfy a third precision requirement corresponding to the each of the vector elements;
obtaining code representative of at least a portion of at least one hardware node for implementing the neural network processor;
obtaining a synthesis model comprising a plurality of synthesis specialization parameters including;
(1) a first synthesis specialization parameter corresponding to a first native dimension of the each of the matrix elements, (2) a second synthesis specialization parameter corresponding to a second native dimension of the each of the vector elements, and (3) a third synthesis specialization parameter corresponding to a number of the plurality of tile engines, wherein each of a first value corresponding to the first synthesis specialization parameter, a second value corresponding to the second synthesis specialization parameter, and a third value corresponding to the third synthesis specialization parameter is selected to meet or exceed a performance metric associated with the at least one hardware node; and
using the processor modifying the code, based on at least the first minimum number of units, the second minimum number of units, the third minimum number of units and at least the first value and the second value, to generate a modified version of the code and storing a modified version of the code.
1 Assignment
0 Petitions
Accused Products
Abstract
Neural network processors that have been customized based on application specific synthesis specialization parameters and related methods are described. Certain example neural network processors and methods described in the present disclosure expose several major synthesis specialization parameters that can be used for specializing a microarchitecture instance of a neural network processor to specific neural network models including: (1) aligning the native vector dimension to the parameters of the model to minimize padding and waste during model evaluation, (2) increasing lane widths to drive up intra-row-level parallelism, or (3) increasing matrix multiply tiles to exploit sub-matrix parallelism for large neural network models.
45 Citations
20 Claims
-
1. A method, implemented by a processor, for synthesizing a neural network processor comprising a plurality of tile engines, wherein each of the plurality of tile engines is configured to process matrix elements and vector elements, the method comprising:
-
using the processor analyzing a neural network model corresponding to an application to determine;
(1) a first minimum number of units required to express a shared exponent value required to satisfy a first precision requirement corresponding to each of the matrix elements and corresponding to each of the vector elements, (2) a second minimum number of units required to express a first mantissa value required to satisfy a second precision requirement corresponding to the each of the matrix elements, and (3) a third minimum number of units required to express a second mantissa value required to satisfy a third precision requirement corresponding to the each of the vector elements;obtaining code representative of at least a portion of at least one hardware node for implementing the neural network processor; obtaining a synthesis model comprising a plurality of synthesis specialization parameters including;
(1) a first synthesis specialization parameter corresponding to a first native dimension of the each of the matrix elements, (2) a second synthesis specialization parameter corresponding to a second native dimension of the each of the vector elements, and (3) a third synthesis specialization parameter corresponding to a number of the plurality of tile engines, wherein each of a first value corresponding to the first synthesis specialization parameter, a second value corresponding to the second synthesis specialization parameter, and a third value corresponding to the third synthesis specialization parameter is selected to meet or exceed a performance metric associated with the at least one hardware node; andusing the processor modifying the code, based on at least the first minimum number of units, the second minimum number of units, the third minimum number of units and at least the first value and the second value, to generate a modified version of the code and storing a modified version of the code. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a processor; and a memory comprising;
(1) code representative of at least a portion of at least one hardware node for implementing the neural network processor comprising a plurality of tile engines, wherein each of the plurality of tile engines is configured to process matrix elements and vector elements, (2) a synthesis model comprising a plurality of synthesis specialization parameters including;
(a) a first synthesis specialization parameter corresponding to a first native dimension of the each of the matrix elements and, (b) a second synthesis specialization parameter corresponding to a second native dimension of the each of the vector elements, and (c) a third synthesis specialization parameter corresponding to a number of the plurality of tile engines, wherein each of a first value corresponding to the first synthesis specialization parameter, a second value corresponding to the second synthesis specialization parameter, and a third value corresponding to the third synthesis specialization parameter is selected to meet or exceed a performance metric associated with the at least one hardware node, and (3) instructions for synthesizing a neural network processor comprising a plurality of tile engines, wherein each of the plurality of tile engines is configured to process matrix elements and vector elements, the instructions configured to;using the processor analyze a neural network model corresponding to an application to determine;
(1) a first minimum number of units required to express a shared exponent value required to satisfy a first precision requirement corresponding to each of the matrix elements and corresponding each of the vector elements, (2) a second minimum number of units required to express a first mantissa value required to satisfy a second precision requirement corresponding to the each of the matrix elements, and (3) a third minimum number of units required to express a second mantissa value required to satisfy a third precision requirement corresponding to the each of the vector elements, andusing the processor modify the code, based on at least the first minimum number of units, the second minimum number of units, the third minimum number of units and at least the first value, the second value, and the third value, to generate a modified version of the code and store a modified version of the code. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A method, implemented by a processor, for synthesizing a neural network processor comprising a plurality of tile engines, wherein each of the plurality of tile engines is configured to process matrix elements and vector elements, and wherein each of the plurality of tile engines comprises a plurality of dot product units and wherein each of the dot product units is configured to receive the matrix elements from a matrix register file, the method comprising:
-
using the processor analyzing a neural network model corresponding to an application to determine;
(1) a first minimum number of units required to express a shared exponent value required to satisfy a first precision requirement corresponding to each of the matrix elements and corresponding to each of the vector elements, (2) a second minimum number of units required to express a first mantissa value required to satisfy a second precision requirement corresponding to the each of the matrix elements, and (3) a third minimum number of units required to express a second mantissa value required to satisfy a third precision requirement corresponding to the each of the vector elements;obtaining code representative of at least a portion of at least one hardware node for implementing the neural network processor; obtaining a synthesis model comprising a plurality of synthesis specialization parameters including;
(1) a first synthesis specialization parameter corresponding to whether the matrix register file is private to each one of the plurality of tile engines or whether the matrix register file is shared among the plurality of tile engines and (2) a second synthesis specialization parameter corresponding to whether each of the plurality of dot product units comprises an add-reduction tree; andusing the processor modifying the code, based on at least the first minimum number of units, the second minimum number of units, the third minimum number of units and at least the first synthesis specialization parameter and the second synthesis specialization parameter, and storing a modified version of the code. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification