System and method for cluster management based on HPC architecture
First Claim
Patent Images
1. A method, comprising:
- identifying, by one or more first hardware processors of a cluster management engine, which nodes in a virtual cluster are available, the virtual cluster including a plurality of communicatively coupled nodes, each node including a cluster agent in communication with the cluster management engine, and the virtual cluster comprising a logical grouping of nodes configured to process jobs;
identifying, by one or more of the one or more first hardware processors or one or more second hardware processors, a job of the jobs that is compatible with the identified available nodes including;
selecting the job from a job queue;
identifying a shape that sufficiently matches a shape of the job, the shape indicating a specific topology of a sub-cluster of nodes from the cluster of nodes suitable to execute the selected job;
identifying one or more shapes of the available nodes including one or more of (1) a cube in which nodes of the available nodes are allocated in a logical cubic volume so as to allow tasks of the job to exchange data with other tasks of the job so as to minimize a distance between nodes that exchange data, and (2) a sphere in which nodes of the available nodes are allocated in a logical spherical volume so as to allow a first task of the job to be placed in a center node of the sphere with remaining tasks of the job placed on nodes surrounding the center node so as to minimize a distance between the first task and the remaining tasks; and
determining whether the available nodes are sufficient to execute the job based on the identified shape that sufficiently matches the shape of the job and the one or more identified shapes of the available nodes;
in response to determining whether the available nodes are sufficient, allocating, by the one or more of the one or more first hardware processors, the one or more second hardware processors, or one or more third hardware processors, a plurality of the available nodes for the job that sufficiently fit the identified shape of the job; and
executing the job using the allocated nodes.
0 Assignments
0 Petitions
Accused Products
Abstract
Cluster management software comprises a plurality of cluster agents, with each cluster agent associated with an HPC node including an integrated fabric and the cluster agent operable to determine a status of the associated HPC node. The software further includes a cluster management engine communicably coupled with the plurality of the HPC nodes and operable to execute an HPC job using a dynamically allocated subset of the plurality of HPC nodes based on the determined status of the plurality of HPC nodes.
-
Citations
20 Claims
-
1. A method, comprising:
-
identifying, by one or more first hardware processors of a cluster management engine, which nodes in a virtual cluster are available, the virtual cluster including a plurality of communicatively coupled nodes, each node including a cluster agent in communication with the cluster management engine, and the virtual cluster comprising a logical grouping of nodes configured to process jobs; identifying, by one or more of the one or more first hardware processors or one or more second hardware processors, a job of the jobs that is compatible with the identified available nodes including; selecting the job from a job queue; identifying a shape that sufficiently matches a shape of the job, the shape indicating a specific topology of a sub-cluster of nodes from the cluster of nodes suitable to execute the selected job; identifying one or more shapes of the available nodes including one or more of (1) a cube in which nodes of the available nodes are allocated in a logical cubic volume so as to allow tasks of the job to exchange data with other tasks of the job so as to minimize a distance between nodes that exchange data, and (2) a sphere in which nodes of the available nodes are allocated in a logical spherical volume so as to allow a first task of the job to be placed in a center node of the sphere with remaining tasks of the job placed on nodes surrounding the center node so as to minimize a distance between the first task and the remaining tasks; and determining whether the available nodes are sufficient to execute the job based on the identified shape that sufficiently matches the shape of the job and the one or more identified shapes of the available nodes; in response to determining whether the available nodes are sufficient, allocating, by the one or more of the one or more first hardware processors, the one or more second hardware processors, or one or more third hardware processors, a plurality of the available nodes for the job that sufficiently fit the identified shape of the job; and executing the job using the allocated nodes. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a cluster management engine; a plurality of communicatively coupled nodes, each node comprising a processing device and a cluster agent, each of the nodes in communication with a cluster management engine; and the cluster management engine configured to; identify which nodes in a virtual cluster are available, the virtual cluster comprising a logical grouping of nodes configured to process related jobs; identify a job that is compatible with the available nodes including the cluster management engine configured to; select the job from a job queue; identify a shape that sufficiently matches a shape of the job, the shape indicating a specific topology of a sub-cluster of nodes from the cluster of nodes suitable to execute the selected job; identify one or more shapes of the available nodes, the shape including one or more of (1) a cube in which nodes of the available nodes are allocated in a cubic volume so as to allow tasks of the job to exchange data with other tasks of the job so as to minimize the distance between nodes that exchange data, and (2) a sphere in which nodes of the available nodes are allocated in a spherical volume so as to allow a first task of the job to be placed in a center node of the sphere with remaining tasks of the job placed on nodes surrounding the center node so as to minimize the distance between the first task and the remaining tasks; and determine whether the available nodes are sufficient to execute the job based on the identified shape that sufficiently matches the shape of the job and the one or more identified shapes of the available nodes; the cluster management engine further configured to, in response to determining the available nodes are sufficient, allocate a plurality of the available nodes for the job that sufficiently fit the identified shape of the job; and the allocated nodes of the virtual cluster are configured to execute the job. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable storage device including instructions stored thereon which, when executed by a machine, configure the machine to:
-
identify which nodes in a virtual cluster are available, the virtual cluster including a plurality of communicatively coupled nodes, each node including a cluster agent in communication with the cluster management engine, and the virtual cluster comprising a logical grouping of nodes configured to process jobs; identify a job of the jobs that is compatible with the available nodes in the virtual cluster including; select the job from a job queue; select the job from a job queue; identify a shape that sufficiently matches a shape of the job, the shape indicating a specific topology of a sub-cluster of nodes from the cluster of nodes suitable to execute the selected job; identify one or more shapes of the available nodes, the shape including one or more of (1) a cube in which nodes of the available nodes are allocated in a cubic volume so as to allow tasks of the job to exchange data with other tasks of the job so as to minimize the distance between nodes that exchange data, and (2) a sphere in which nodes of the available nodes are allocated in a spherical volume so as to allow a first task of the job to be placed in a center node of the sphere with remaining tasks of the job placed on nodes surrounding the center node so as to minimize the distance between the first task and the remaining tasks; and determine whether the available nodes are sufficient to execute the job based on the identified shape that sufficiently matches the shape of the job and one or more identified shapes of the available nodes; in response to determining the available nodes are sufficient, allocate a plurality of he available nodes for the job that sufficiently fit the identified shape of the job; and execute the job using the allocated nodes. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification