NEURAL NETWORK UNIT WITH NEURAL PROCESSING UNITS DYNAMICALLY CONFIGURABLE TO PROCESS MULTIPLE DATA SIZES
First Claim
1. A neural network unit, comprising:
- a register that holds an indicator that specifies narrow and wide configurations;
a first memory that holds rows of 2N narrow or N wide weight words when the indicator indicates the narrow or wide configuration, respectively;
a second memory that holds rows of 2N narrow or N wide data words when the indicator indicates the narrow or wide configuration, respectively; and
an array of neural processing units (NPU), the array configured as 2N narrow or N wide NPUs and configured to receive the 2N narrow or N wide weight words of rows from the first memory and configured to receive the 2N narrow or N wide data words of rows from the second memory when the indicator indicates the narrow or wide configuration, respectively;
when the indicator indicates the narrow configuration, the 2N NPUs are configured to perform narrow arithmetic operations on the 2N narrow weight words and the 2N narrow data words received from the first and second memories; and
when the indicator indicates the wide configuration, the N NPUs are configured to perform wide arithmetic operations on the N wide weight words and the N wide data words received from the first and second memories.
1 Assignment
0 Petitions
Accused Products
Abstract
A neural network unit. A register holds an indicator that specifies narrow and wide configurations. A first memory holds rows of 2N/N narrow/wide weight words in the narrow/wide configuration. A second memory holds rows of 2N/N narrow/wide data words in the narrow/wide configuration. An array of neural processing units (NPU) is configured as 2N/N narrow/wide NPUs and to receive the 2N/N narrow/wide weight words of rows from the first memory and to receive the 2N/N narrow/wide data words of rows from the second memory in the narrow/wide configuration. In the narrow configuration, the 2N NPUs perform narrow arithmetic operations on the 2N narrow weight words and the 2N narrow data words received from the first and second memories. In the wide configuration, the N NPUs perform wide arithmetic operations on the N wide weight words and the N wide data words received from the first and second memories.
-
Citations
20 Claims
-
1. A neural network unit, comprising:
-
a register that holds an indicator that specifies narrow and wide configurations; a first memory that holds rows of 2N narrow or N wide weight words when the indicator indicates the narrow or wide configuration, respectively; a second memory that holds rows of 2N narrow or N wide data words when the indicator indicates the narrow or wide configuration, respectively; and an array of neural processing units (NPU), the array configured as 2N narrow or N wide NPUs and configured to receive the 2N narrow or N wide weight words of rows from the first memory and configured to receive the 2N narrow or N wide data words of rows from the second memory when the indicator indicates the narrow or wide configuration, respectively; when the indicator indicates the narrow configuration, the 2N NPUs are configured to perform narrow arithmetic operations on the 2N narrow weight words and the 2N narrow data words received from the first and second memories; and when the indicator indicates the wide configuration, the N NPUs are configured to perform wide arithmetic operations on the N wide weight words and the N wide data words received from the first and second memories. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A neural network unit, comprising:
-
a register that holds an indicator that specifies first and second configurations; an array of neural processing units (NPU), the array configured as 2N or N NPUs when the indicator indicates the first or second configuration, respectively, each NPU having; an accumulator having an output; an arithmetic unit having first, second and third inputs and that performs an operation thereon to generate a result to store in the accumulator, the first input receives the output of the accumulator; a weight input that is received by the second input to the arithmetic unit; and a multiplexed register having first, second and third data inputs, an output received by the third input to the arithmetic unit, and a control input that controls selection of the first, second and third data inputs; and the output of the multiplexed register is also received by the third data input of the multiplexed register and by the second data input of the multiplexed register of an adjacent NPU, the multiplexed registers of the N NPUs collectively operate as an N-word rotater when the control input specifies the second data input, and the multiplexed registers of the N NPUs collectively operate as a 2N-word rotater when the control input specifies the third data input; a first memory that holds W rows of 2N or N weight words and provides the 2N or N weight words of a row of the W rows to the corresponding weight inputs of the 2N or N NPUs when the indicator indicates the first or second configuration, respectively; and a second memory that holds D rows of 2N or N data words and provides the 2N or N data words of a row of the D rows to the corresponding first data inputs of the multiplexed register of the 2N or N NPUs when the indicator indicates the first or second configuration, respectively. - View Dependent Claims (12, 13, 14)
-
-
15. A method for operating a neural network unit having an array of neural processing units (NPU) configurable as 2N narrow or N wide NPUs and to receive the 2N narrow or N wide weight words of rows from the first memory and to receive the 2N narrow or N wide data words of rows from the second memory when the indicator indicates the narrow or wide configuration, respectively, the method comprising:
-
holding, by a register, an indicator that specifies narrow and wide configurations; holding, by a first memory, rows of 2N narrow or N wide weight words when the indicator indicates the narrow or wide configuration, respectively; holding, by a second memory, rows of 2N narrow or N wide data words when the indicator indicates the narrow or wide configuration, respectively; and performing, by the 2N NPU when the indicator indicates the narrow configuration, narrow arithmetic operations on the 2N narrow weight words and the 2N narrow data words received from the first and second memories; and performing, by the N NPUs when the indicator indicates the wide configuration, wide arithmetic operations on the N wide weight words and the N wide data words received from the first and second memories. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification