INCREASING AVAILABLE FIFO SPACE TO PREVENT MESSAGING QUEUE DEADLOCKS IN A DMA ENVIRONMENT
First Claim
1. A method for managing message queues in a parallel computing system having a plurality of compute nodes, comprising:
- determining that a first queue, on a first compute node, storing a set of message descriptors has become full, wherein a direct memory access controller (DMA) is configured to inject message descriptors into the first queue; and
generating an interrupt delivered to an interrupt handler, wherein the interrupt handler is configured to perform the steps of;
stopping the DMA controller;
allocating a region of memory, wherein the memory region is large enough to store the set of messaging descriptors from the first queue;
moving the stored descriptors in the first queue into a second queue local to a messaging managernotifying the messaging manager about the memory region, andrestarting the DMA controller.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the invention may be used to manage message queues in a parallel computing environment to prevent message queue deadlock. A direct memory access controller of a compute node may determine when a messaging queue is full. In response, the DMA may generate an interrupt. An interrupt handler may stop the DMA and swap all descriptors from the full messaging queue into a larger queue (or enlarge the original queue). The interrupt handler then restarts the DMA. Alternatively, the interrupt handler stops the DMA, allocates a memory block to hold queue data, and then moves descriptors from the full messaging queue into the allocated memory block. The interrupt handler then restarts the DMA. During a normal messaging advance cycle, a messaging manager attempts to inject the descriptors in the memory block into other messaging queues until the descriptors have all been processed.
-
Citations
21 Claims
-
1. A method for managing message queues in a parallel computing system having a plurality of compute nodes, comprising:
-
determining that a first queue, on a first compute node, storing a set of message descriptors has become full, wherein a direct memory access controller (DMA) is configured to inject message descriptors into the first queue; and generating an interrupt delivered to an interrupt handler, wherein the interrupt handler is configured to perform the steps of; stopping the DMA controller; allocating a region of memory, wherein the memory region is large enough to store the set of messaging descriptors from the first queue; moving the stored descriptors in the first queue into a second queue local to a messaging manager notifying the messaging manager about the memory region, and restarting the DMA controller. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-readable storage-medium containing a program which, when executed, performs an operation for managing message queues in a parallel computing system having a plurality of compute nodes, the operation comprising:
-
determining that a first queue, on a first compute node, storing a set of message descriptors has become full, wherein a direct memory access controller (DMA) is configured to inject message descriptors into the first queue; and generating an interrupt delivered to an interrupt handler, wherein the interrupt handler is configured to perform the steps of; stopping the DMA controller; allocating a region of memory, wherein the memory region is large enough to store the set of messaging descriptors from the first queue; moving the stored descriptors in the first queue into a second queue local to a messaging manager notifying the messaging manager about the memory region, and restarting the DMA controller. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A parallel computing system, comprising:
a plurality of compute nodes, each having at least a processor, a memory and a direct memory access controller (DMA), wherein the plurality of compute nodes are configured to move messages between two compute nodes of the plurality, and wherein the DMA on a first compute node is configured to; determine that a first queue, on a first compute node, storing a set of message descriptors has become full, wherein a direct memory access controller (DMA) is configured to inject message descriptors into the first queue; and generate an interrupt delivered to an interrupt handler, wherein the interrupt handler is configured to perform the steps of; stopping the DMA controller, allocating a region of memory, wherein the memory region is large enough to store the set of messaging descriptors from the first queue; moving the stored descriptors in the first queue into a second queue local to a messaging manager notifying the messaging manager about the memory region, and restarting the DMA controller. - View Dependent Claims (18, 20, 21)
-
19. The parallel computing system of claim 117, wherein the message descriptors are sent to the first compute node from a second compute node over a network connection connecting the first compute node and the second compute node.
Specification