Optimized collectives using a DMA on a parallel computer
First Claim
1. A method for optimizing collective operations using direct memory access controller on a parallel computer, comprising:
- establishing a byte counter associated with direct memory access controller for each submessage in a message, the byte counter storing at least a base address of memory and a byte count associated with a submessage, the byte counter including at least a reception counter and an injection counter;
monitoring the byte counter associated with a submessage to determine whether at least a block of data of the submessage has been received, the block of data having a predetermined size;
processing the block when said block has been fully received; and
continuing the monitoring and processing step until all blocks in all submessages in the message have been processed.
2 Assignments
0 Petitions
Accused Products
Abstract
Optimizing collective operations using direct memory access controller on a parallel computer, in one aspect, may comprise establishing a byte counter associated with a direct memory access controller for each submessage in a message. The byte counter includes at least a base address of memory and a byte count associated with a submessage. A byte counter associated with a submessage is monitored to determine whether at least a block of data of the submessage has been received. The block of data has a predetermined size, for example, a number of bytes. The block is processed when the block has been fully received, for example, when the byte count indicates all bytes of the block have been received. The monitoring and processing may continue for all blocks in all submessages in the message.
-
Citations
20 Claims
-
1. A method for optimizing collective operations using direct memory access controller on a parallel computer, comprising:
-
establishing a byte counter associated with direct memory access controller for each submessage in a message, the byte counter storing at least a base address of memory and a byte count associated with a submessage, the byte counter including at least a reception counter and an injection counter; monitoring the byte counter associated with a submessage to determine whether at least a block of data of the submessage has been received, the block of data having a predetermined size; processing the block when said block has been fully received; and continuing the monitoring and processing step until all blocks in all submessages in the message have been processed. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for optimizing collective operations using direct memory access controller on a parallel computer, comprising:
-
one or more processors in a node; memory in the node, the memory including at least an injection fifo and a receive buffer; direct memory access controller in the node, the direct memory access controller including at least a byte counter for each submessage of a message, the byte counter including at least a base address in memory for storing associated submessage and a counter value, the byte counter including at least a reception counter and an injection counter, the direct memory access controller operable to update the counter value as a result of receiving one or more bytes of the associated submessage into the node; said one or more processors operable to monitor the counter value and when a predetermined number of bytes of the submessage is received, the one or more processors further operable to process a block of data comprising the received predetermined number of bytes of the submessage. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A method for optimizing collective operations using direct memory access controller on a parallel computer, comprising:
-
allocating a reception buffer on a node for receiving a message, the reception buffer having different slots for storing data received from different nodes on a parallel computer; defining a counter identifier associated with a counter on the node, the counter identifier being common among all nodes on the parallel computer and the counter comprising at least a base address of the reception buffer for placing a message and a counter value indicating a count of received data; receiving at a node a message from all the nodes; polling the counter value until the count of received data equals a predetermined size of data that is expected to be received; and performing a short reduction on the received message.
-
Specification