System and method for cluster management based on HPC architecture
First Claim
Patent Images
1. A method, comprising:
- determining, by a cluster management engine, available space in a virtual cluster of a plurality of communicatively coupled nodes included in a computing environment, each node including a cluster agent in communication with the cluster management engine, the virtual cluster associated with a group of users that submit similar jobs, and comprising a logical grouping of nodes configured to process related jobs;
prior to job execution, determining an optimum job that is compatible with the available space in the virtual cluster of nodes; and
following the determining, executing the optimum job in the available space in the virtual cluster of nodes;
wherein determining the optimum job that is compatible with the available space in the virtual cluster of nodes further comprises;
determining a number of available nodes in the virtual cluster;
selecting a first job from a job queue;
dynamically determining an optimum shape of the first job;
determining whether the number of available nodes is enough to execute the first job, based on the optimum shape thereof; and
dynamically allocating one or more of the available nodes for the first job, in the event that the determined number of available nodes is enough to execute the first job;
wherein the optimum shape comprises one or more of;
a best fit cube in which the one or more available nodes are allocated in a cubic volume so as to allow cooperating tasks to exchange data with any other tasks by minimizing the distance between any two nodes; and
a best fit sphere in which the one or more available nodes are allocated in a spherical volume such that a first task is placed in a center node of the sphere with remaining tasks placed on nodes surrounding the center node so as to minimize the distance between the first task and the remaining tasks, wherein the remaining tasks communicate with the first task, but not with each other.
2 Assignments
0 Petitions
Accused Products
Abstract
Cluster management software comprises a plurality of cluster agents, with each cluster agent associated with an HPC node including an integrated fabric and the cluster agent operable to determine a status of the associated HPC node. The software further includes a cluster management engine communicably coupled with the plurality of the HPC nodes and operable to execute an HPC job using a dynamically allocated subset of the plurality of HPC nodes based on the determined status of the plurality of HPC nodes.
-
Citations
18 Claims
-
1. A method, comprising:
-
determining, by a cluster management engine, available space in a virtual cluster of a plurality of communicatively coupled nodes included in a computing environment, each node including a cluster agent in communication with the cluster management engine, the virtual cluster associated with a group of users that submit similar jobs, and comprising a logical grouping of nodes configured to process related jobs; prior to job execution, determining an optimum job that is compatible with the available space in the virtual cluster of nodes; and following the determining, executing the optimum job in the available space in the virtual cluster of nodes; wherein determining the optimum job that is compatible with the available space in the virtual cluster of nodes further comprises; determining a number of available nodes in the virtual cluster; selecting a first job from a job queue; dynamically determining an optimum shape of the first job; determining whether the number of available nodes is enough to execute the first job, based on the optimum shape thereof; and dynamically allocating one or more of the available nodes for the first job, in the event that the determined number of available nodes is enough to execute the first job; wherein the optimum shape comprises one or more of; a best fit cube in which the one or more available nodes are allocated in a cubic volume so as to allow cooperating tasks to exchange data with any other tasks by minimizing the distance between any two nodes; and a best fit sphere in which the one or more available nodes are allocated in a spherical volume such that a first task is placed in a center node of the sphere with remaining tasks placed on nodes surrounding the center node so as to minimize the distance between the first task and the remaining tasks, wherein the remaining tasks communicate with the first task, but not with each other. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system comprising:
-
a plurality of communicatively coupled nodes of a computing environment, each node comprising a processing device and including a cluster agent in communication with a cluster management engine; and the cluster management engine configured to; determine available space in a virtual cluster of a plurality of the communicatively coupled nodes, the virtual cluster associated with a group of users that submit similar jobs, and comprising a logical grouping of nodes configured to process related jobs; prior to job execution, determine an optimum job that is compatible with the available space in the virtual cluster of nodes; and following the determining, execute the optimum job in the available space in the virtual cluster of nodes wherein the cluster management engine is further configured to; determine a number of available nodes in the virtual cluster; select a first job from a job queue; dynamically determine an optimum shape of the first job; determine whether the number of available nodes is enough to execute the first job, based on the optimum shape thereof; and dynamically allocate one or more of the available nodes for the first job, in the event that the determined number of available nodes is enough to execute the first job; wherein the optimum shape comprises one or more of; a best fit cube in which the one or more available nodes are allocated in a cubic volume so as to allow cooperating tasks to exchange data with any other tasks by minimizing the distance between any two nodes; and a best fit sphere in which the one or more available nodes are allocated in a spherical volume such that a first task is placed in a center node of the sphere with remaining tasks placed on nodes surrounding the center node so as to minimize the distance between the first task and the remaining tasks, wherein the remaining tasks communicate with the first task, but not with each other. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A non-transitory, computer readable storage medium having computer readable instructions stored thereon that, when executed by a computer, implement a method, the method comprising:
-
determining, by a cluster management engine, available space in a virtual cluster of a plurality of communicatively coupled nodes included in a computing environment, each node including a cluster agent in communication with the cluster management engine, the virtual cluster associated with a group of users that submit similar jobs, and comprising a logical grouping of nodes configured to process related jobs; prior to job execution, determining an optimum job that is compatible with the available space in the virtual cluster of nodes; and following the determining, executing the optimum job in the available space in the virtual cluster of nodes; wherein determining the optimum job that is compatible with the available space in the virtual cluster of nodes further comprises; determining a number of available nodes in the virtual cluster; selecting a first job from a job queue; dynamically determining an optimum shape of the first job; determining whether the number of available nodes is enough to execute the first job, based on the optimum shape thereof; and dynamically allocating one or more of the available nodes for the first job, in the event that the determined number of available nodes is enough to execute the first job; wherein the optimum shape comprises one or more of; a best fit cube in which the one or more available nodes are allocated in a cubic volume so as to allow cooperating tasks to exchange data with any other tasks by minimizing the distance between any two nodes; and a best fit sphere in which the one or more available nodes are allocated in a spherical volume such that a first task is placed in a center node of the sphere with remaining tasks placed on nodes surrounding the center node so as to minimize the distance between the first task and the remaining tasks, wherein the remaining tasks communicate with the first task, but not with each other. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification