Environment based node selection for work scheduling in a parallel computing system
First Claim
1. A method of managing a plurality of jobs throughout a plurality of computing nodes of a parallel computing system disposed in a data center, the method comprising:
- in response to receiving a workload, dividing the workload into a plurality of jobs;
accessing a job information table including historical information associated with at least one job of the plurality of jobs, wherein the job information table includes a unique identifier associated with the at least one job;
accessing one or more tables including physical locations associated with each of the plurality of computing nodes and each of one or both of a plurality of power circuits or a plurality of cooling sources distributed throughout the data center, wherein the physical locations are defined by three-dimensional (x,y,z) coordinates stored in the one or more tables;
scheduling the plurality of jobs in the workload for execution on a group of computing nodes from among the plurality of computing nodes in the parallel computing system based upon the physical locations of the plurality of computing nodes in the data center, the physical locations of the one or both of the plurality of power circuits or the plurality of cooling sources, and the historical information in the job information table, including selecting the group of computing nodes by assigning individual computing nodes to form the group so as to distribute at least one of a heat load and an energy load within the data center; and
executing the plurality of jobs on the group of computing nodes.
1 Assignment
0 Petitions
Accused Products
Abstract
A method, apparatus, and program product manage scheduling of a plurality of jobs in a parallel computing system of the type that includes a plurality of computing nodes and is disposed in a data center. The plurality of jobs are scheduled for execution on a group of computing nodes from the plurality of computing nodes based on the physical locations of the plurality of computing nodes in the data center. The group of computing nodes is further selected so as to distribute at least one of a heat load and an energy load within the data center. The plurality of jobs may be additionally scheduled based upon an estimated processing requirement for each job of the plurality of jobs.
-
Citations
20 Claims
-
1. A method of managing a plurality of jobs throughout a plurality of computing nodes of a parallel computing system disposed in a data center, the method comprising:
-
in response to receiving a workload, dividing the workload into a plurality of jobs; accessing a job information table including historical information associated with at least one job of the plurality of jobs, wherein the job information table includes a unique identifier associated with the at least one job; accessing one or more tables including physical locations associated with each of the plurality of computing nodes and each of one or both of a plurality of power circuits or a plurality of cooling sources distributed throughout the data center, wherein the physical locations are defined by three-dimensional (x,y,z) coordinates stored in the one or more tables; scheduling the plurality of jobs in the workload for execution on a group of computing nodes from among the plurality of computing nodes in the parallel computing system based upon the physical locations of the plurality of computing nodes in the data center, the physical locations of the one or both of the plurality of power circuits or the plurality of cooling sources, and the historical information in the job information table, including selecting the group of computing nodes by assigning individual computing nodes to form the group so as to distribute at least one of a heat load and an energy load within the data center; and executing the plurality of jobs on the group of computing nodes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method of managing a plurality of jobs throughout a plurality of computing nodes of a parallel computing system disposed in a data center, the method comprising:
-
in response to receiving a workload, dividing the workload into a plurality of jobs; accessing a job information table including historical information associated with at least one job of the plurality of jobs, wherein the job information table includes a unique identifier associated with the at least one job; accessing one or more tables including physical locations associated with each of the plurality of computing nodes and each of one or both of a plurality of power circuits or a plurality of cooling sources distributed throughout the data center, wherein the physical locations are defined by three-dimensional (x,y,z) coordinates stored in the one or more tables; scheduling the plurality of jobs in the workload for execution on a group of computing nodes from among the plurality of computing nodes in the parallel computing system based upon an estimated processing requirement for each job of the plurality of jobs, the historical information in the job information table, the physical locations of the one or both of the plurality of power circuits or the plurality of cooling sources, and the physical locations of the plurality of computing nodes in the data center , including selecting the group of computing nodes by assigning individual computing nodes to form the group so as to distribute at least one of a heat load and an energy load within the data center; and executing the plurality of jobs on the group of computing nodes. - View Dependent Claims (13, 14)
-
-
15. A parallel computing system disposed in a data center, comprising:
-
a plurality of computing nodes, each computing node including at least one processing unit; and program code configured to be executed by the parallel computing system to manage a workload of the parallel computing system, the program code further configured to divide the workload into a plurality of jobs, access a job information table including historical information associated with at least one job of the plurality of jobs, wherein the job information table includes a unique identifier associated with at least one job, access one or more tables including physical locations associated with each of the plurality of computing nodes and each of one or both of a plurality of power circuits or a plurality of cooling sources distributed throughout the data center, wherein the physical locations are defined by three-dimensional (x,y,z) coordinates stored in the one or more tables, schedule the plurality of jobs in the workload for execution on a group of computing nodes from among the plurality of computing nodes in the parallel computing system based upon the physical locations of the plurality of computing nodes in the data center, the physical locations of the one or both of the plurality of power circuits or the plurality of cooling sources, and the historical information in the job information table, including selecting the group of computing nodes by assigning individual computing nodes to form the group so as to distribute at least one of a heat load and an energy load within the data center, and execute the plurality of jobs on the group of computing nodes. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification