Performing optimized collective operations in an irregular subcommunicator of compute nodes in a parallel computer
First Claim
1. A method of performing optimized collective operations in an irregular subcommunicator of compute nodes in a parallel computer, the method comprising:
- identifying, within the irregular subcommunicator of the compute nodes in the parallel computer, regular neighborhoods of compute nodes within the irregular subcommunicator of the compute nodes in the parallel computer, wherein the irregular subcommunicator of the compute nodes in the parallel computer has topological communication gaps and the regular neighborhoods of the compute nodes in the parallel computer are logical planes with no topological communication gaps in each neighborhood, wherein identifying regular neighborhoods of compute nodes within the irregular subcommunicator of the compute nodes in the parallel computer comprises;
establishing, by each respective compute node within the irregular subcommunicator of the compute nodes in the parallel computer, at least one logical plane that includes the respective compute node, wherein establishing the at least one logical plane comprises;
identifying, in a positive direction of a first dimension, each logical plane that includes the respective compute node, a first compute node of the irregular subcommunicator that is one or more hops away from the respective compute node in a positive direction of a second dimension, wherein the second dimension is orthogonal to the first dimension;
identifying, in a negative direction of the first dimension, each logical plane that includes the respective compute node and a second compute node of the irregular subcommunicator that is one or more hops away from the respective compute node in the positive direction of the second dimension;
selecting, for each neighborhood from the compute nodes of the neighborhood, a local root node;
assigning each local root node to a node of a neighborhood-wide tree topology;
mapping, for each neighborhood, the compute nodes of the neighborhood to a local tree topology having, at its root, the local root node of the neighborhood; and
performing a one way, rooted collective operation within the irregular subcommunicator including;
performing, in one phase, the collective operation within each neighborhood and performing in another phase, the collective operation amongst the local root nodes.
1 Assignment
0 Petitions
Accused Products
Abstract
In a parallel computer, performing optimized collective operations in an irregular subcommunicator of compute nodes may be carried out by: identifying, within the irregular subcommunicator, regular neighborhoods of compute nodes; selecting, for each neighborhood from the compute nodes of the neighborhood, a local root node; assigning each local root node to a node of a neighborhood-wide tree topology; mapping, for each neighborhood, the compute nodes of the neighborhood to a local tree topology having, at its root, the local root node of the neighborhood; and performing a one way, rooted collective operation within the subcommunicator including: performing, in one phase, the collective operation within each neighborhood; and performing, in another phase, the collective operation amongst the local root nodes.
23 Citations
18 Claims
-
1. A method of performing optimized collective operations in an irregular subcommunicator of compute nodes in a parallel computer, the method comprising:
-
identifying, within the irregular subcommunicator of the compute nodes in the parallel computer, regular neighborhoods of compute nodes within the irregular subcommunicator of the compute nodes in the parallel computer, wherein the irregular subcommunicator of the compute nodes in the parallel computer has topological communication gaps and the regular neighborhoods of the compute nodes in the parallel computer are logical planes with no topological communication gaps in each neighborhood, wherein identifying regular neighborhoods of compute nodes within the irregular subcommunicator of the compute nodes in the parallel computer comprises; establishing, by each respective compute node within the irregular subcommunicator of the compute nodes in the parallel computer, at least one logical plane that includes the respective compute node, wherein establishing the at least one logical plane comprises; identifying, in a positive direction of a first dimension, each logical plane that includes the respective compute node, a first compute node of the irregular subcommunicator that is one or more hops away from the respective compute node in a positive direction of a second dimension, wherein the second dimension is orthogonal to the first dimension; identifying, in a negative direction of the first dimension, each logical plane that includes the respective compute node and a second compute node of the irregular subcommunicator that is one or more hops away from the respective compute node in the positive direction of the second dimension; selecting, for each neighborhood from the compute nodes of the neighborhood, a local root node; assigning each local root node to a node of a neighborhood-wide tree topology; mapping, for each neighborhood, the compute nodes of the neighborhood to a local tree topology having, at its root, the local root node of the neighborhood; and performing a one way, rooted collective operation within the irregular subcommunicator including;
performing, in one phase, the collective operation within each neighborhood and performing in another phase, the collective operation amongst the local root nodes. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. An apparatus for performing optimized collective operations in an irregular subcommunicator of compute nodes in a parallel computer, the apparatus comprising a computer processor, a computer memory operatively coupled to the computer processor, the computer memory having disposed within it computer program instructions that, when executed, cause the apparatus to carry out the steps of:
-
identifying, within the irregular subcommunicator of the compute nodes in the parallel computer, regular neighborhoods of compute nodes within the irregular subcommunicator of the compute nodes in the parallel computer, wherein the irregular subcommunicator of the compute nodes in the parallel computer has topological communication gaps and the regular neighborhoods of the compute nodes in the parallel computer are logical planes with no topological communication gaps in each neighborhood, wherein identifying regular neighborhoods of compute nodes within the irregular subcommunicator of the compute nodes in the parallel computer comprises; establishing, by each respective compute node within the irregular subcommunicator of the compute nodes in the parallel computer, at least one logical plane that includes the respective compute node, wherein establishing the at least one logical plane comprises; identifying, in a positive direction of a first dimension, each logical plane that includes the respective compute node, a first compute node of the irregular subcommunicator that is one or more hops away from the respective compute node in a positive direction of a second dimension, wherein the second dimension is orthogonal to the first dimension; identifying, in a negative direction of the first dimension, each logical plane that includes the respective compute node and a second compute node of the irregular subcommunicator that is one or more hops away from the respective compute node in the positive direction of the second dimension; selecting, for each neighborhood from the compute nodes of the neighborhood, a local root node; assigning each local root node to a node of a neighborhood-wide tree topology; mapping, for each neighborhood, the compute nodes of the neighborhood to a local tree topology having, at its root, the local root node of the neighborhood; and performing a one way, rooted collective operation within the irregular subcommunicator including;
performing, in one phase, the collective operation within each neighborhood and performing in another phase, the collective operation amongst the local root nodes. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer program product for performing optimized collective operations in an irregular subcommunicator of compute nodes in a parallel computer, the computer program product disposed upon a non-transitory computer readable medium, the computer program product comprising computer program instructions that, when executed, cause a computer to carry out the steps of:
-
identifying, within the irregular subcommunicator of the compute nodes in the parallel computer, regular neighborhoods of compute nodes within the irregular subcommunicator of the compute nodes in the parallel computer, wherein the irregular subcommunicator of the compute nodes in the parallel computer has topological communication gaps and the regular neighborhoods of the compute nodes in the parallel computer are logical planes with no topological communication gaps in each neighborhood, wherein identifying regular neighborhoods of compute nodes within the irregular subcommunicator of the compute nodes in the parallel computer comprises; establishing, by each respective compute node within the irregular subcommunicator of the compute nodes in the parallel computer, at least one logical plane that includes the respective compute node, wherein establishing the at least one logical plane comprises; identifying, in a positive direction of a first dimension, each logical plane that includes the respective compute node, a first compute node of the irregular subcommunicator that is one or more hops away from the respective compute node in a positive direction of a second dimension, wherein the second dimension is orthogonal to the first dimension; identifying, in a negative direction of the first dimension, each logical plane that includes the respective compute node and a second compute node of the irregular subcommunicator that is one or more hops away from the respective compute node in the positive direction of the second dimension; selecting, for each neighborhood from the compute nodes of the neighborhood, a local root node; assigning each local root node to a node of a neighborhood-wide tree topology; mapping, for each neighborhood, the compute nodes of the neighborhood to a local tree topology having, at its root, the local root node of the neighborhood; and performing a one way, rooted collective operation within the irregular subcommunicator including;
performing, in one phase, the collective operation within each neighborhood and performing in another phase, the collective operation amongst the local root nodes. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification