System and Method for Performing Collective Operations Using Software Setup and Partial Software Execution at Leaf Nodes in a Multi-Tiered Full-Graph Interconnect Architecture
First Claim
1. A method, in a data processing system, for performing collective operations, the data processing system comprising a plurality of supernodes, the plurality of supernodes comprising a plurality of processor books, and the plurality of processor books comprising a plurality of processors, the method comprising:
- determining, in software executing on a parent processor in a first processor book of the data processing system, a number of other processors in a same or different processor book of the data processing system needed to execute the collective operation, thereby establishing a subset of processors comprising the parent processor and the other processors;
logically arranging, in the software executing on the parent processor, the subset of processors as a plurality of nodes in a hierarchical structure;
transmitting the collective operation to the subset of processors based on the hierarchical structure;
receiving, in hardware of the parent processor, results from the execution of the collective operation from the other processors;
generating, in hardware of the parent processor, a final result of the collective operation based on the results received from execution of the collective operation by the other processors; and
outputting the final result.
2 Assignments
0 Petitions
Accused Products
Abstract
A method, computer program product, and system are provided for performing collective operations. In software executing on a parent processor in a first processor book, a number of other processors are determined in a same or different processor book of the data processing system that is needed to execute the collective operation, thereby establishing a plurality of processors comprising the parent processor and the other processors. In software executing on the parent processor, the plurality of processors are logically arranged as a plurality of nodes in a hierarchical structure. The collective operation is transmitted to the plurality of processors based on the hierarchical structure. In hardware of the parent processor, results are received from the execution of the collective operation from the other processors, a final result is generated of the collective operation based on the received results, and the final result is output.
145 Citations
20 Claims
-
1. A method, in a data processing system, for performing collective operations, the data processing system comprising a plurality of supernodes, the plurality of supernodes comprising a plurality of processor books, and the plurality of processor books comprising a plurality of processors, the method comprising:
-
determining, in software executing on a parent processor in a first processor book of the data processing system, a number of other processors in a same or different processor book of the data processing system needed to execute the collective operation, thereby establishing a subset of processors comprising the parent processor and the other processors; logically arranging, in the software executing on the parent processor, the subset of processors as a plurality of nodes in a hierarchical structure; transmitting the collective operation to the subset of processors based on the hierarchical structure; receiving, in hardware of the parent processor, results from the execution of the collective operation from the other processors; generating, in hardware of the parent processor, a final result of the collective operation based on the results received from execution of the collective operation by the other processors; and outputting the final result. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer program product, for performing collective operations, comprising a computer useable medium having a computer readable program, wherein the computer readable program, when executed in a parent processor in a first processor book of a data processing system, causes the parent processor to:
-
determining, in software executing on the parent processor, a number of other processors in a same or different processor book of the data processing system needed to execute the collective operation, thereby establishing a subset of processors comprising the parent processor and the other processors; logically arranging, in the software executing on the parent processor, the subset of processors as a plurality of nodes in a hierarchical structure; transmitting the collective operation to the subset of processors based on the hierarchical structure; receiving, in hardware of the parent processor, results from the execution of the collective operation from the other processors; generating, in hardware of the parent processor, a final result of the collective operation based on the results received from execution of the collective operation by the other processors; and outputting the final result, wherein the data processing system comprises a plurality of supernodes, the plurality of supernodes comprising a plurality of processor books, and the plurality of processor books comprising a plurality of processors. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A data processing system for performing collective operations, comprising:
-
a parent processor in a first processor book of the data processing system; and a memory coupled to the parent processor, wherein the memory comprises instructions which, when executed by the parent processor, cause the parent processor to; determining, in software executing on a parent processor, a number of other processors in a same or different processor book of the data processing system needed to execute the collective operation, thereby establishing a subset of processors comprising the parent processor and the other processors; logically arranging, in the software executing on the parent processor, the subset of processors as a plurality of nodes in a hierarchical structure; transmitting the collective operation to the subset of processors based on the hierarchical structure; receiving, in hardware of the parent processor, results from the execution of the collective operation from the other processors; generating, in hardware of the parent processor, a final result of the collective operation based on the results received from execution of the collective operation by the other processors; and outputting the final result, wherein the data processing system comprises a plurality of supernodes, the plurality of sup emodes comprising a plurality of processor books, and the plurality of processor books comprising a plurality of processors. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification