Performing collective operations in a distributed processing system
First Claim
1. A method of performing collective operations on a hybrid distributed processing system, wherein:
- the hybrid distributed processing system includes a plurality of compute nodes, each compute node having a plurality of tasks, each task assigned a unique rank, each compute node coupled for data communications by at least one data communications network implementing at least two different networking topologies, wherein the hybrid distributed processing system utilizes the at least two networking topologies to perform the collective operations, wherein a first networking topology comprises a tiered tree topology having a root task, and at least two child tasks, where the two child tasks are peers of one another in the same tier, the method further comprising;
determining by at least one task that a parent of the task has failed to send the task data through the tree topology; and
determining whether to request the data from a grandparent of the task or a peer of the task in the same tier in the tree topology;
if the task requests the data from the grandparent, requesting the data and receiving the data from the grandparent of the task through the second networking topology; and
if the task requests the data from a peer of the task in the same tier in the tree, requesting the data and receiving the data from a peer of the task through the second networking topology.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, apparatuses, and computer program products for performing collective operations on a hybrid distributed processing system including: determining by at least one task that a parent of the task has failed to send the task data through the tree topology; and determining whether to request the data from a grandparent of the task or a peer of the task in the same tier in the tree topology; and if the task requests the data from the grandparent, requesting the data and receiving the data from the grandparent of the task through the second networking topology; and if the task requests the data from a peer of the task in the same tier in the tree, requesting the data and receiving the data from a peer of the task through the second networking topology.
-
Citations
6 Claims
-
1. A method of performing collective operations on a hybrid distributed processing system, wherein:
-
the hybrid distributed processing system includes a plurality of compute nodes, each compute node having a plurality of tasks, each task assigned a unique rank, each compute node coupled for data communications by at least one data communications network implementing at least two different networking topologies, wherein the hybrid distributed processing system utilizes the at least two networking topologies to perform the collective operations, wherein a first networking topology comprises a tiered tree topology having a root task, and at least two child tasks, where the two child tasks are peers of one another in the same tier, the method further comprising; determining by at least one task that a parent of the task has failed to send the task data through the tree topology; and determining whether to request the data from a grandparent of the task or a peer of the task in the same tier in the tree topology; if the task requests the data from the grandparent, requesting the data and receiving the data from the grandparent of the task through the second networking topology; and if the task requests the data from a peer of the task in the same tier in the tree, requesting the data and receiving the data from a peer of the task through the second networking topology. - View Dependent Claims (2, 3, 4, 5, 6)
-
Specification