NEURAL NETWORK PROCESSOR BASED ON APPLICATION SPECIFIC SYNTHESIS SPECIALIZATION PARAMETERS

US 20190325296A1
Filed: 04/21/2018
Published: 10/24/2019
Est. Priority Date: 04/21/2018
Status: Active Grant

First Claim

Patent Images

1. A method, implemented by a processor, for synthesizing a neural network processor comprising a plurality of tile engines, wherein each of the plurality of tile engines is configured to process matrix elements and vector elements, the method comprising:

using the processor analyzing a neural network model corresponding to an application to determine;

(1) a first minimum number of units required to express a shared exponent value required to satisfy a first precision requirement corresponding to each of the matrix elements and corresponding to each of the vector elements, (2) a second minimum number of units required to express a first mantissa value required to satisfy a second precision requirement corresponding to the each of the matrix elements, and (3) a third minimum number of units required to express a second mantissa value required to satisfy a third precision requirement corresponding to the each of the vector elements;

obtaining code representative of at least a portion of at least one hardware node for implementing the neural network processor;

obtaining a synthesis model comprising a plurality of synthesis specialization parameters including;

(1) a first synthesis specialization parameter corresponding to a first native dimension of the each of the matrix elements, (2) a second synthesis specialization parameter corresponding to a second native dimension of the each of the vector elements, and (3) a third synthesis specialization parameter corresponding to a number of the plurality of tile engines, wherein each of a first value corresponding to the first synthesis specialization parameter, a second value corresponding to the second synthesis specialization parameter, and a third value corresponding to the third synthesis specialization parameter is selected to meet or exceed a performance metric associated with the at least one hardware node; and

using the processor modifying the code, based on at least the first minimum number of units, the second minimum number of units, the third minimum number of units and at least the first value and the second value, to generate a modified version of the code and storing a modified version of the code.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Neural network processors that have been customized based on application specific synthesis specialization parameters and related methods are described. Certain example neural network processors and methods described in the present disclosure expose several major synthesis specialization parameters that can be used for specializing a microarchitecture instance of a neural network processor to specific neural network models including: (1) aligning the native vector dimension to the parameters of the model to minimize padding and waste during model evaluation, (2) increasing lane widths to drive up intra-row-level parallelism, or (3) increasing matrix multiply tiles to exploit sub-matrix parallelism for large neural network models.

45 Citations

20 Claims

1. A method, implemented by a processor, for synthesizing a neural network processor comprising a plurality of tile engines, wherein each of the plurality of tile engines is configured to process matrix elements and vector elements, the method comprising:
- using the processor analyzing a neural network model corresponding to an application to determine;
  
  (1) a first minimum number of units required to express a shared exponent value required to satisfy a first precision requirement corresponding to each of the matrix elements and corresponding to each of the vector elements, (2) a second minimum number of units required to express a first mantissa value required to satisfy a second precision requirement corresponding to the each of the matrix elements, and (3) a third minimum number of units required to express a second mantissa value required to satisfy a third precision requirement corresponding to the each of the vector elements;
  
  obtaining code representative of at least a portion of at least one hardware node for implementing the neural network processor;
  
  obtaining a synthesis model comprising a plurality of synthesis specialization parameters including;
  
  (1) a first synthesis specialization parameter corresponding to a first native dimension of the each of the matrix elements, (2) a second synthesis specialization parameter corresponding to a second native dimension of the each of the vector elements, and (3) a third synthesis specialization parameter corresponding to a number of the plurality of tile engines, wherein each of a first value corresponding to the first synthesis specialization parameter, a second value corresponding to the second synthesis specialization parameter, and a third value corresponding to the third synthesis specialization parameter is selected to meet or exceed a performance metric associated with the at least one hardware node; and
  
  using the processor modifying the code, based on at least the first minimum number of units, the second minimum number of units, the third minimum number of units and at least the first value and the second value, to generate a modified version of the code and storing a modified version of the code.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method of claim 1, wherein the plurality of synthesis specialization parameters further comprises a fourth synthesis specialization parameter corresponding to a number of parallel multipliers that can process the matrix elements and the vector elements to produce a partial dot-product.
  - 3. The method of claim 2, wherein the plurality of synthesis specialization parameters further comprises a fifth synthesis specialization parameter corresponding to a number of independent parallel channels of the plurality of tile engines.
  - 4. The method of claim 3, wherein the plurality of synthesis specialization parameters further comprises a sixth synthesis specialization parameter corresponding to a number of groups, wherein each of the groups has a group size equal to the number of the plurality of tile engines divided by the number of the independent parallel channels.
  - 5. The method of claim 1, wherein the at least one hardware node comprises a field programmable gate array (FPGA) including adaptive logic modules, digital signal processors, and random-access memories, and wherein the performance metric corresponds to an area required to implement the adaptive logic modules, the digital signal processors, and the random-access memories as part of the FPGA.
  - 6. The method of claim 1, wherein each of the plurality of tile engines comprises a plurality of dot product units and wherein each of the plurality of dot product units is configured to receive the matrix elements from a matrix register file, and wherein the plurality of synthesis specialization parameters further comprises an eighth synthesis specialization parameter corresponding to whether the matrix register file is private to each one of the plurality of tile engines or whether the matrix register file is shared among the plurality of tile engines.
  - 7. The method of claim 1, wherein each of the plurality of tile engines comprises a plurality of dot product units and wherein the plurality of synthesis specialization parameters further comprises a ninth synthesis specialization parameter corresponding to whether each of the plurality of dot product units comprises an add-reduction tree.

8. A system comprising:
- a processor; and
  
  a memory comprising;
  
  (1) code representative of at least a portion of at least one hardware node for implementing the neural network processor comprising a plurality of tile engines, wherein each of the plurality of tile engines is configured to process matrix elements and vector elements, (2) a synthesis model comprising a plurality of synthesis specialization parameters including;
  
  (a) a first synthesis specialization parameter corresponding to a first native dimension of the each of the matrix elements and, (b) a second synthesis specialization parameter corresponding to a second native dimension of the each of the vector elements, and (c) a third synthesis specialization parameter corresponding to a number of the plurality of tile engines, wherein each of a first value corresponding to the first synthesis specialization parameter, a second value corresponding to the second synthesis specialization parameter, and a third value corresponding to the third synthesis specialization parameter is selected to meet or exceed a performance metric associated with the at least one hardware node, and (3) instructions for synthesizing a neural network processor comprising a plurality of tile engines, wherein each of the plurality of tile engines is configured to process matrix elements and vector elements, the instructions configured to;
  
  using the processor analyze a neural network model corresponding to an application to determine;
  
  (1) a first minimum number of units required to express a shared exponent value required to satisfy a first precision requirement corresponding to each of the matrix elements and corresponding each of the vector elements, (2) a second minimum number of units required to express a first mantissa value required to satisfy a second precision requirement corresponding to the each of the matrix elements, and (3) a third minimum number of units required to express a second mantissa value required to satisfy a third precision requirement corresponding to the each of the vector elements, andusing the processor modify the code, based on at least the first minimum number of units, the second minimum number of units, the third minimum number of units and at least the first value, the second value, and the third value, to generate a modified version of the code and store a modified version of the code.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The system of claim 8, wherein the plurality of synthesis specialization parameters further comprises a third synthesis specialization parameter corresponding to a number of parallel multipliers that can process the matrix elements and the vector elements to produce a partial dot-product.
  - 10. The system of claim 9, wherein the plurality of synthesis specialization parameters further comprises a fourth synthesis specialization parameter corresponding to a number of independent parallel channels of the plurality of tile engines.
  - 11. The system of claim 10, wherein the plurality of synthesis specialization parameters further comprises a fifth synthesis specialization parameter corresponding to a number of groups, wherein each of the groups has a group size equal to the number of the plurality of tile engines divided by the number of the independent parallel channels.
  - 12. The system of claim 8, wherein the at least one hardware node comprises a field programmable gate array (FPGA) including adaptive logic modules, digital signal processors, and random-access memories, and wherein the performance metric corresponds to an area required to implement the adaptive logic modules, the digital signal processors, and the random-access memories as part of the FPGA.
  - 13. The system of claim 8, wherein each of the plurality of tile engines comprises a plurality of dot product units and wherein each of the plurality of dot product units is configured to receive the matrix elements from a matrix register file, and wherein the plurality of synthesis specialization parameters further comprises a seventh synthesis specialization parameter corresponding to whether the matrix register file is private to each one of the plurality of tile engines or whether the matrix register file is shared among the plurality of tile engines.
  - 14. The system of claim 8, wherein each of the plurality of tile engines comprises a plurality of dot product units and wherein the plurality of synthesis specialization parameters further comprises an eighth synthesis specialization parameter corresponding to whether each of the plurality of dot product units comprises an add-reduction tree.

15. A method, implemented by a processor, for synthesizing a neural network processor comprising a plurality of tile engines, wherein each of the plurality of tile engines is configured to process matrix elements and vector elements, and wherein each of the plurality of tile engines comprises a plurality of dot product units and wherein each of the dot product units is configured to receive the matrix elements from a matrix register file, the method comprising:
- using the processor analyzing a neural network model corresponding to an application to determine;
  
  (1) a first minimum number of units required to express a shared exponent value required to satisfy a first precision requirement corresponding to each of the matrix elements and corresponding to each of the vector elements, (2) a second minimum number of units required to express a first mantissa value required to satisfy a second precision requirement corresponding to the each of the matrix elements, and (3) a third minimum number of units required to express a second mantissa value required to satisfy a third precision requirement corresponding to the each of the vector elements;
  
  obtaining code representative of at least a portion of at least one hardware node for implementing the neural network processor;
  
  obtaining a synthesis model comprising a plurality of synthesis specialization parameters including;
  
  (1) a first synthesis specialization parameter corresponding to whether the matrix register file is private to each one of the plurality of tile engines or whether the matrix register file is shared among the plurality of tile engines and (2) a second synthesis specialization parameter corresponding to whether each of the plurality of dot product units comprises an add-reduction tree; and
  
  using the processor modifying the code, based on at least the first minimum number of units, the second minimum number of units, the third minimum number of units and at least the first synthesis specialization parameter and the second synthesis specialization parameter, and storing a modified version of the code.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The method of claim 15, wherein the plurality of synthesis specialization parameters further comprises a third synthesis specialization parameter corresponding to a number of parallel multipliers that can process the matrix elements and the vector elements to produce a partial dot-product.
  - 17. The method of claim 16, wherein the plurality of synthesis specialization parameters further comprises a fourth synthesis specialization parameter corresponding to a number of independent parallel channels of the plurality of tile engines.
  - 18. The method of claim 17, wherein the plurality of synthesis specialization parameters further comprises a fifth synthesis specialization parameter corresponding to a number of groups, wherein each of the groups has a group size equal to the number of the plurality of tile engines divided by the number of the independent parallel channels of the plurality of tile engines.
  - 19. The method of claim 15, wherein the at least one hardware node comprises a field programmable gate array (FPGA) including adaptive logic modules, digital signal processors, and random-access memories, and wherein the performance metric corresponds to an area required to implement the adaptive logic modules, the digital signal processors, and the random-access memories as part of the FPGA.
  - 20. The method of claim 15, wherein the performance metric corresponds to an area required to implement each of the plurality of tile engines.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Original Assignee
Microsoft Technology Licensing LLC (Microsoft Corporation)
Inventors
Fowers, Jeremy, Ovtcharov, Kalin, Chung, Eric S., Massengill, Todd Michael, Liu, Ming Gang, Weisz, Gabriel Leonard

Granted Patent

US 11,556,762 B2
Time in Patent Office

Days
Field of Search
US Class Current
CPC Class Codes

G06F 15/8053   Vector processors

G06F 17/16   Matrix or vector computatio...

G06N 3/044   Recurrent networks, e.g. Ho...

G06N 3/045   Combinations of networks

G06N 3/063   using electronic means

NEURAL NETWORK PROCESSOR BASED ON APPLICATION SPECIFIC SYNTHESIS SPECIALIZATION PARAMETERS

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

45 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

NEURAL NETWORK PROCESSOR BASED ON APPLICATION SPECIFIC SYNTHESIS SPECIALIZATION PARAMETERS

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

45 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links