MULTICAST NETWORK AND MEMORY TRANSFER OPTIMIZATIONS FOR NEURAL NETWORK HARDWARE ACCELERATION
First Claim
1. A system to configure input data for multicast to data receivers, comprising:
- a computer memory storing configuration data in a known address and input data;
a set of switch nodes configured into a Bene{hacek over (s)} multicast network ordered into multiple layers, each layer with a plurality of switch nodes, the first layer proximate to the input data and the last layer proximate to a data receiver, each switch node storing received input data comprised in a plurality of entries, a configuration indicator, and a controller indicator,the configuration indicator to specify whether to perform a broadcast mode whether input data is to be forwarded according to the configuration data, or a passthru mode wherein input data is to be forwarded regardless of the configuration data, andthe controller indicator to specify whether to update at least one switch node entry; and
a set of control registers communicatively connected to each switch node in the set of switch nodes, the set of registers configured to store received configuration data, such that the set of nodes is configured within two operations, the first operation to read the configuration data from the known address in the computer memory, and a second operation to populate the set of control registers, and subsequent operations comprising multicast operations by the set of switch nodes according to the populated set of control registers.
2 Assignments
0 Petitions
Accused Products
Abstract
Neural network specific hardware acceleration optimizations are disclosed, including an optimized multicast network and an optimized DRAM transfer unit to perform in constant or linear time. The multicast network is a set of switch nodes organized into layers and configured to operate as a Bene{hacek over (s)} network. Configuration data may be accessed by all switch nodes in the network. Each layer is configured to perform a Bene{hacek over (s)} network transformation of the -previous layer within a computer instruction. Since the computer instructions are pipelined, the entire network of switch nodes may be configured in constant or linear time. Similarly a DRAM transfer unit configured to access memory in strides organizes memory into banks indexed by prime or relatively prime number amounts. The index value is selected as not to cause memory address collisions. Upon receiving a memory specification, the DRAM transfer unit may calculate out strides thereby accessing an entire tile of a tensor in constant or linear time.
13 Citations
20 Claims
-
1. A system to configure input data for multicast to data receivers, comprising:
-
a computer memory storing configuration data in a known address and input data; a set of switch nodes configured into a Bene{hacek over (s)} multicast network ordered into multiple layers, each layer with a plurality of switch nodes, the first layer proximate to the input data and the last layer proximate to a data receiver, each switch node storing received input data comprised in a plurality of entries, a configuration indicator, and a controller indicator, the configuration indicator to specify whether to perform a broadcast mode whether input data is to be forwarded according to the configuration data, or a passthru mode wherein input data is to be forwarded regardless of the configuration data, and the controller indicator to specify whether to update at least one switch node entry; and a set of control registers communicatively connected to each switch node in the set of switch nodes, the set of registers configured to store received configuration data, such that the set of nodes is configured within two operations, the first operation to read the configuration data from the known address in the computer memory, and a second operation to populate the set of control registers, and subsequent operations comprising multicast operations by the set of switch nodes according to the populated set of control registers. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system to deterministically transfer partitions of contiguous computer readable data in constant time, comprising:
-
a computer readable memory organized into D banks, containing contiguous data containing a plurality of data elements of size M which are constituent data elements of a vector with N data elements, the data elements starting at an offset address O; and a modulo address generator configured to generate the addresses of the data elements of a vector with i data elements stored in the computer readable memory, and the modulo address generator comprised of at least one forward permutaton configured to permute data elements with addresses of the form O+M*i where 0ā
i<
N. - View Dependent Claims (11, 12, 13, 14, 15, 16)
-
-
17. A method to configure input data for multicast to data receivers, comprising:
-
retrieving configuration data retrieved from a known address in a computer memory, the configuration data to configure a set of switch nodes configured into a Bene{hacek over (s)} multicast network ordered into multiple layers, each layer with a plurality of switch nodes, the first layer proximate to the input data and the last layer proximate to a data receiver, each switch node storing received input data comprised in a plurality of entries, a configuration indicator, and a controller indicator, wherein the configuration data is comprised of a configuration indicator to specify whether to perform a broadcast mode whether input data is to be forwarded according to the configuration data, or a passthru mode wherein input data is to be forwarded regardless of the configuration data, and a controller indicator to specify whether to update at least one switch node entry; storing the retrieved configuration data in a set of control registers; populating the configuration indicator and controller indicator at each switch node from the set of control registers; and starting a Bene{hacek over (s)} multicast operation at the first level of switch nodes based at least on the configuration indicators and controller indicators of the first level switch nodes. - View Dependent Claims (18)
-
-
19. A method to access computer readable data stored in a computer readable memory in strides, comprising:
-
receiving a parameter o where o=O% D where O is the initial offset address for a computer readable memory storing a number of data elements in contiguous memory and organized into D banks; receiving a parameter r which specifies the number of rotations to perform for a cyclic group less than D, wherein r is based at least on the discrete log of generator g; performing a rotation of data in the data in data banks Dā
1 to D; andperforming a rotation of the data in data banks in O through D.
-
-
20. A method to access computer readable data stored in a computer readable memory in strides, comprising:
-
receiving at a modulo address generator, a memory address and a length; generating by the modulo address generator, a set of memory addresses corresponding to data elements stored in a computer readable memory separated by strides; receiving at an enqueuing controller the generated set of memory addresses; queuing by the enqueuing controller addresses of data elements into corresponding address queues respectively, and concurrently queuing by the enqueuing controller control data in a control queue; decoding the memory addresses of the data elements into data at a decoder and queuing into a corresponding data queue respectively; at a dequeuing controller receiving data elements from the data queues and forwarding the received data elements to a reverse permutaton, based at least on control data received from the control queue; restoring via the reverse permutaton, the data received from the dequeuing controller; and forwarding at the reverse permutaton, the restored data to data out.
-
Specification