Contention free pipelined broadcasting within a constant bisection bandwidth network topology
First Claim
1. A parallel computing system, comprising:
- a plurality of nodes each of which comprises at least one processor and at least one adapter;
an interconnection network comprising a first plurality of switches wherein each of the plurality of nodes is connected to one of the first plurality of switches and wherein each of the first plurality of switches is connected through a second plurality of switches, wherein each of the plurality of nodes is connected through a plurality of shared links connecting the first plurality of switches and the second plurality of switches;
a user space layer distributed in at least one of the plurality of nodes comprising a message passing interface (MPI) layer for receiving a message passing interface broadcast command from an application, wherein the message passing interface broadcast command triggers an MPI broadcast for passing a plurality of data packets from one of the plurality of nodes ranked as a root node to a selection of non-root nodes from among the plurality of nodes;
the MPI layer for triggering a pipelined broadcast manager to schedule a hierarchical pipelined broadcast for the MPI broadcast; and
the pipelined broadcast manager for scheduling the hierarchical pipelined broadcast through at least one switch of the first plurality of switches from among a plurality of non-root nodes by selecting two nodes of the plurality of non-root nodes connected to the at least one switch and scheduling each of a plurality of broadcast steps for the hierarchical pipelined broadcast with at least one of an inter-switch broadcast phase with a first node of the two nodes receiving a first data packet from another switch from among the first plurality of switches and a second node of the two nodes sending a second data packet previously received from the another switch to one other switch from among the plurality of switches and an intra-switch broadcast phase with the first node acting as a source for sending a previously received data packet to at least one other non-root node and the second node acting as a sink for receiving the previously received data packet, wherein each of the non-root nodes sends and receives the previously received data packet once throughout the plurality of broadcast steps and the sink receives the previously received data packet last, wherein for each next step of the plurality of broadcast steps the first node and the second node alternate roles.
1 Assignment
0 Petitions
Accused Products
Abstract
In an interconnection network, multiple nodes are connected to one of a first layer of switches. The first layer of switches is connected to one another through a second layer of switches. Each of the nodes is connected through one of multiple shared links connecting the first layer switches and the second layer of switches. A pipelined broadcast manager schedules a hierarchical pipelined broadcast through at least one switch of the first layer switches comprising non-root nodes by selecting two nodes among the non-root nodes connected to the at least one switch and scheduling each of multiple broadcast steps for the pipelined broadcast with at least one of an inter-switch broadcast phase and an intra-switch broadcast phase using the selected two nodes.
31 Citations
17 Claims
-
1. A parallel computing system, comprising:
-
a plurality of nodes each of which comprises at least one processor and at least one adapter; an interconnection network comprising a first plurality of switches wherein each of the plurality of nodes is connected to one of the first plurality of switches and wherein each of the first plurality of switches is connected through a second plurality of switches, wherein each of the plurality of nodes is connected through a plurality of shared links connecting the first plurality of switches and the second plurality of switches; a user space layer distributed in at least one of the plurality of nodes comprising a message passing interface (MPI) layer for receiving a message passing interface broadcast command from an application, wherein the message passing interface broadcast command triggers an MPI broadcast for passing a plurality of data packets from one of the plurality of nodes ranked as a root node to a selection of non-root nodes from among the plurality of nodes; the MPI layer for triggering a pipelined broadcast manager to schedule a hierarchical pipelined broadcast for the MPI broadcast; and the pipelined broadcast manager for scheduling the hierarchical pipelined broadcast through at least one switch of the first plurality of switches from among a plurality of non-root nodes by selecting two nodes of the plurality of non-root nodes connected to the at least one switch and scheduling each of a plurality of broadcast steps for the hierarchical pipelined broadcast with at least one of an inter-switch broadcast phase with a first node of the two nodes receiving a first data packet from another switch from among the first plurality of switches and a second node of the two nodes sending a second data packet previously received from the another switch to one other switch from among the plurality of switches and an intra-switch broadcast phase with the first node acting as a source for sending a previously received data packet to at least one other non-root node and the second node acting as a sink for receiving the previously received data packet, wherein each of the non-root nodes sends and receives the previously received data packet once throughout the plurality of broadcast steps and the sink receives the previously received data packet last, wherein for each next step of the plurality of broadcast steps the first node and the second node alternate roles. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for pipelined broadcasting within an interconnection network, comprising:
-
connecting communicatively a plurality of nodes each of which comprises at least one adapter through an interconnection network comprising a first plurality of switches, wherein each of the plurality of nodes is connected to one of the first plurality of switches and wherein each of the first plurality of switches is connected through a second plurality of switches, wherein each of the plurality of nodes is connected through a plurality of shared links connecting the first plurality of switches and the second plurality of switches; implementing a user space layer distributed in at least one of the plurality of nodes comprising a message passing interface (MPI) layer for receiving a message passing interface broadcast command from an application, wherein the message passing interface broadcast command triggers an MPI broadcast for passing a plurality of data packets from one of the plurality of nodes ranked as a root node to a selection of non-root nodes from among the plurality of nodes; triggering, by the MPI layer, scheduling of a hierarchical pipelined broadcast for the MPI broadcast; and scheduling, using a processor, the hierarchical pipelined broadcast through at least one switch of the first plurality of switches from among a plurality of non-root nodes by; selecting, using the processor, two nodes of the plurality of non-root nodes connected to the at least one switch; and scheduling, using the processor, each of a plurality of broadcast steps for the hierarchical pipelined broadcast with at least one of an inter-switch broadcast phase with a first node of the two nodes receiving a first data packet from another switch from among the first plurality of switches and a second node of the two nodes sending a second data packet previously received from the another switch to one other switch from among the plurality of switches and an intra-switch broadcast phase with the first node acting as a source for sending a previously received data packet to at least one other non-root node and the second node acting as a sink for receiving the previously received data packet, wherein each of the non-root nodes sends and receives the previously received data packet once throughout the plurality of broadcast steps and the sink receives the previously received data packet last, wherein for each next step of the plurality of broadcast steps the first node and the second node alternate roles. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer program product for pipelined broadcasting within an interconnection network, the computer program product comprising:
-
one or more computer-readable, tangible storage devices; program instructions, stored on at least one of the one or more storage devices, to connect communicatively a plurality of nodes each of which comprises at least one adapter through an interconnection network comprising a first plurality of switches, wherein each of the plurality of nodes is connected to one of the first plurality of switches and wherein each of the first plurality of switches is connected through a second plurality of switches, wherein each of the plurality of nodes is connected through a plurality of shared links connecting the first plurality of switches and the second plurality of switches; program instructions, stored on at least one of the one or more storage devices, to implement a user space layer distributed in at least one of the plurality of nodes comprising a message passing interface (MPI) layer for receiving a message passing interface broadcast command from an application, wherein the message passing interface broadcast command triggers an MPI broadcast for passing a plurality of data packets from one of the plurality of nodes ranked as a root node to a selection of non-root nodes from among the plurality of nodes; program instructions, stored on at least one of the one or more storage devices, to trigger, by the MPI layer, scheduling of a hierarchical pipelined broadcast for the MPI broadcast; and program instructions, stored on at least one of the one or more storage devices, to schedule the hierarchical pipelined broadcast through at least one switch of the first plurality of switches from among a plurality of non-root nodes by selecting two nodes of the plurality of non-root nodes connected to the at least one switch and scheduling each of a plurality of broadcast steps for the hierarchical pipelined broadcast with at least one of an inter-switch broadcast phase with a first node of the two nodes receiving a first data packet from another switch from among the first plurality of switches and a second node of the two nodes sending a second data packet previously received from the another switch to one other switch from among the plurality of switches and an intra-switch broadcast phase with the first node acting as a source for sending a previously received data packet to at least one other non-root node and the second node acting as a sink for receiving the previously received data packet, wherein each of the non-root nodes sends and receives the previously received data packet once throughout the plurality of broadcast steps and the sink receives the previously received data packet last, wherein for each next step of the plurality of broadcast steps the first node and the second node alternate roles.
-
Specification