PERFORMING COLLECTIVE OPERATIONS IN A DISTRIBUTED PROCESSING SYSTEM
First Claim
1. A method of performing collective operations on a hybrid distributed processing system, the hybrid distributed processing system including a plurality of compute nodes, each compute node having a plurality of tasks, each task assigned a unique rank, each compute node coupled for data communications by at least one data communications network implementing at least two different networking topologies, wherein a first networking topology comprises a tiered tree topology having a root task, and at least two child tasks, where the two child tasks are peers of one another in the same tier, the method comprising:
- determining by at least one task that a parent of the task has failed to send the task data through the tree topology; and
determining whether to request the data from a grandparent of the task or a peer of the task in the same tier in the tree topology;
if the task requests the data from the grandparent, requesting the data and receiving the data from the grandparent of the task through the second networking topology; and
if the task requests the data from a peer of the task in the same tier in the tree, requesting the data and receiving the data from a peer of the task through the second networking topology.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, apparatuses, and computer program products for performing collective operations on a hybrid distributed processing system including: determining by at least one task that a parent of the task has failed to send the task data through the tree topology; and determining whether to request the data from a grandparent of the task or a peer of the task in the same tier in the tree topology; and if the task requests the data from the grandparent, requesting the data and receiving the data from the grandparent of the task through the second networking topology; and if the task requests the data from a peer of the task in the same tier in the tree, requesting the data and receiving the data from a peer of the task through the second networking topology.
-
Citations
7 Claims
-
1. A method of performing collective operations on a hybrid distributed processing system, the hybrid distributed processing system including a plurality of compute nodes, each compute node having a plurality of tasks, each task assigned a unique rank, each compute node coupled for data communications by at least one data communications network implementing at least two different networking topologies, wherein a first networking topology comprises a tiered tree topology having a root task, and at least two child tasks, where the two child tasks are peers of one another in the same tier, the method comprising:
-
determining by at least one task that a parent of the task has failed to send the task data through the tree topology; and determining whether to request the data from a grandparent of the task or a peer of the task in the same tier in the tree topology; if the task requests the data from the grandparent, requesting the data and receiving the data from the grandparent of the task through the second networking topology; and if the task requests the data from a peer of the task in the same tier in the tree, requesting the data and receiving the data from a peer of the task through the second networking topology. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7-18. -18. (canceled)
Specification