Support of non-trivial scheduling policies along with topological properties
First Claim
1. A computer system having a scheduling system to schedule a job having resource mapping requirements to resources in a computing architecture arranged at least in part on node boards in host computers, each node board having at least one central processor unit (CPU) and shared memory, said node boards being interconnected into groups of node boards providing access between the central processing units (CPUs) and shared memory on different node boards, said computer system comprising:
- a processor for executing computing instructions;
memory for storing said computing instructions;
a scheduling system associated with the processor and the memory, and comprising;
a scheduling unit for scheduling jobs to at least some of said resources, said scheduling unit generating a candidate host list representing the resources available to execute the job to be scheduled based on resource requirements of the job to be scheduled, wherein the candidate host list includes reserved resources which have been reserved for larger jobs that have been received in previous scheduling cycles but have not yet been scheduled, and which satisfy predetermined requirements for backfilling the job to be scheduled, but exclude resources which do not satisfy the predetermined requirements for backfilling for the job to be scheduled;
a topology library unit comprising a machine map M of the computer system, said machine map M indicative of the interconnections of the resources available to the scheduling system for scheduling the jobs;
a topology monitoring unit for monitoring a status of the resources and generating status information signals indicative of a status of the resources, wherein the status information signals include suspended status information signals indicative of resources which have been executing a previously scheduled job, but of which execution has been suspended;
wherein the topology library unit receives the status information signals and the candidate host list and determines a free map F of resources to execute the job to be scheduled, said free map F indicative of the interconnection of the resources available for scheduling to the job in a current scheduling cycle based on the status information signals, the candidate host list, and the machine map M, wherein the topology library unit determines the free map F by including in the free map F any reserved resources available to be backfilled as included in the candidate host list, and by including in the free map F any suspended resources which satisfy predetermined requirements for executing a job, and excluding from the free map F any suspended resources which do not satisfy predetermined requirements for executing a job; and
wherein the topology monitoring unit receives the resource requirements of the job to be scheduled from the scheduling unit, and receives the free map F from the topology library unit, and dispatches a job to the resources in the free map F which match the resource mapping requirements of the job.
5 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for scheduling jobs in a multiprocessor machine are disclosed. The status of resources in the multiprocessor machine is periodically determined. The status indicates the resources available to execute jobs. This information is accumulated by the topology-monitoring unit and provided to the topology library. The topology library also receives a candidate host list which lists all resources available to execute the job being scheduled based on non-trivial scheduling. The topology library unit generates a free map F indicating the interconnection of the resources available to execute the job. The topology monitoring unit matches jobs to the resources available to execute the jobs, based on resource requirements including shape requirements indicative of interconnections of resources required to execute the job. The topology monitoring unit dispatches the job to the portion of the free map F which matches the shape requirements of the job.
81 Citations
18 Claims
-
1. A computer system having a scheduling system to schedule a job having resource mapping requirements to resources in a computing architecture arranged at least in part on node boards in host computers, each node board having at least one central processor unit (CPU) and shared memory, said node boards being interconnected into groups of node boards providing access between the central processing units (CPUs) and shared memory on different node boards, said computer system comprising:
-
a processor for executing computing instructions; memory for storing said computing instructions; a scheduling system associated with the processor and the memory, and comprising; a scheduling unit for scheduling jobs to at least some of said resources, said scheduling unit generating a candidate host list representing the resources available to execute the job to be scheduled based on resource requirements of the job to be scheduled, wherein the candidate host list includes reserved resources which have been reserved for larger jobs that have been received in previous scheduling cycles but have not yet been scheduled, and which satisfy predetermined requirements for backfilling the job to be scheduled, but exclude resources which do not satisfy the predetermined requirements for backfilling for the job to be scheduled; a topology library unit comprising a machine map M of the computer system, said machine map M indicative of the interconnections of the resources available to the scheduling system for scheduling the jobs; a topology monitoring unit for monitoring a status of the resources and generating status information signals indicative of a status of the resources, wherein the status information signals include suspended status information signals indicative of resources which have been executing a previously scheduled job, but of which execution has been suspended; wherein the topology library unit receives the status information signals and the candidate host list and determines a free map F of resources to execute the job to be scheduled, said free map F indicative of the interconnection of the resources available for scheduling to the job in a current scheduling cycle based on the status information signals, the candidate host list, and the machine map M, wherein the topology library unit determines the free map F by including in the free map F any reserved resources available to be backfilled as included in the candidate host list, and by including in the free map F any suspended resources which satisfy predetermined requirements for executing a job, and excluding from the free map F any suspended resources which do not satisfy predetermined requirements for executing a job; and wherein the topology monitoring unit receives the resource requirements of the job to be scheduled from the scheduling unit, and receives the free map F from the topology library unit, and dispatches a job to the resources in the free map F which match the resource mapping requirements of the job. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer system having a scheduling system to schedule a job having resource mapping requirements to resources in a computing architecture arranged at least in part on node boards in host computers, each node board having at least one central processor unit (CPU) and shared memory, said node boards being interconnected into groups of node boards providing access between the central processing units (CPUs) and shared memory on different node boards, said computer system comprising:
-
a processor for executing computing instructions; memory for storing said computing instructions; a scheduling system associated with the processor and the memory and comprising; a scheduling unit for scheduling jobs to at least some of said resources, said scheduling unit generating a candidate host list representing the resources available to execute the job to be scheduled based on resource requirements of the job to be scheduled; a topology library unit comprising a machine map M of the computer system, said machine map M indicative of the interconnections of the resources available to the scheduling system for scheduling the jobs; a topology monitoring unit for monitoring a status of the resources and generating status information signals indicative of a status of the resource; wherein the topology library unit receives the status information signals and the candidate host list, and determines a free map F of resources to execute the job to be scheduled, said free map F indicative of the interconnection of the resources available for scheduling to the job in a current scheduling cycle based on the status information signals, the candidate host list, and the machine map M; wherein the topology monitoring unit receives the resource requirements of the job to be scheduled from the scheduling unit, and receives the free map F from the topology library unit, and dispatches a job to the resources in the free map F which match the resource mapping requirements of the job; and wherein the topology library unit determines the free map F of resources based on the status information signals, the candidate host list, and the machine map M utilizing the following free map F formula;
Fij=Hj(F)\(S\Ui=1.N(Si)\So|policy=MIGRATE)\\(R\Ro|mode=BACKFILL|PREEMPT\Ui=1.N(ROi)|mode=BACKFILL|PREEMPT)++Ui=1.N(Ti)|mode=PREEMPT\(E\Eo)where Hj(F) host to set translation operator, architect and implementation specific, for host j F machine free map S non-volatile suspension map Ui=1.N(Si) union of all suspended jobs that are allowed to be taken over for this job R volatile reservation map, a union of all reservation decisions of the last scheduling cycle RO job'"'"'s own last reservation map Ui=1.N(Roi) union of all slot reserving jobs that are allowed to be preempted or backfilled by this job Ui=1.N(Ti) union of all running jobs that are allowed to be preempted by this job E non-volatile preemption reservation map, a union of all preemption reservation decisions Eo job'"'"'s own last preemption reservation map So job'"'"'s own map preserved over migration re-calculation cycle + “
full set union”
operator\ set difference operator. - View Dependent Claims (13)
-
-
14. A computer system having a scheduling system for implementing non-trivial scheduling requiring more than one scheduling cycle to schedule a job having resource mapping requirements to resources in a computing architecture arranged at least in part on node boards in host computers, each node board having at least one central processor unit (CPU) and shared memory, said node boards being interconnected into groups of node boards providing access between the central processing units (CPUs) and shared memory on different node boards, said computer system comprising:
-
a processor for executing computing instructions; memory for storing said computing instructions; a scheduling system associated with the processor and the memory, and comprising; a scheduling unit for scheduling jobs to at least some of said resources, said scheduling unit generating a candidate host list representing the resources available to execute the job to be scheduled based on resource requirements of the job to be scheduled; a topology library unit comprising a machine map M of the computer system, said machine map M indicative of the interconnections of the resources available to a scheduling system for scheduling the jobs, the topology library unit further comprising at least one global status map Yn, each said global status map Yn indicative of interconnections of resources available to the scheduling unit for scheduling jobs if non-trivial scheduling is utilized; a topology monitoring unit for monitoring a status of the resources and generating status information signals indicative of a status of the resources; wherein the topology library unit receives the status information signals and the candidate host lists, and determines a free map F of resources to execute the job to be scheduled, said free map F indicative of the interconnection of resources available for scheduling to the job in a current scheduling cycle based on the status information signals, the candidate host list, and the machine map M; wherein the scheduling unit also indicates resources which have a status other than free and which the scheduling unit has determined are available to the topology library unit for scheduling the job being scheduled if non-trivial scheduling is utilized; wherein the topology library unit initially determines a free map F of resources to execute the job being scheduled by removing from the machine map M all resources which fall within the at least one global status map Yn, and then re-introducing specific resources in the at least one global status map Yn to which the scheduling unit has indicated the job being scheduled can be scheduled; wherein the topology monitoring unit receives the resource requirements of the job to be scheduled from the scheduling unit and receives the free map F from the topology library unit, and dispatches a job to the resources in the free map F which match the resource mapping requirements of the job and which fall within the free map F as determined by the topology library unit based on the candidate host list, the machine map M, the status information signals, and the specific resources in the at least one global status map Yn to which the scheduling unit has indicated the job being scheduled can be scheduled; and wherein the topology library unit schedules a job to resources in the at least one global status map Yn to which the scheduling unit has indicated the job being scheduled can be scheduled only if the topology monitoring unit cannot match the mapping requirements of the job to the other resources in the free map F and not in the at least one global status maps Yn.
-
-
15. A computer system having a scheduling system for implementing non-trivial scheduling requiring more than one scheduling cycle to schedule a job having resource mapping requirements to resources in a computing architecture arranged at least in part on node boards in host computers, each node board having at least one central processor unit (CPU) and shared memory, said node boards being interconnected into groups of node boards providing access between the central processing units (CPUs) and shared memory on different node boards, said computer system comprising:
-
a processor for executing computing instructions; memory for storing said computing instructions; a scheduling system associated with the processor and the memory, and comprising; a scheduling unit for scheduling jobs to at least some of said resources, said scheduling unit generating a candidate host list representing the resources available to execute the job to be scheduled based on resource requirements of the job to be scheduled; a topology library unit comprising a machine map M of the computer system, said machine map M indicative of the interconnections of the resources available to a scheduling system for scheduling the jobs, the topology library unit further comprising at least one global status map Yn, each said global status map Yn indicative of interconnections of resources available to the scheduling unit for scheduling jobs if non-trivial scheduling is utilized, and each said global status map Yn comprising a suspension map S indicative of interconnections of resources which have been suspended; a topology monitoring unit for monitoring a status of the resources and generating status information signals indicative of a status of the resources; wherein the topology library unit receives the status information signals and the candidate host lists, and determines a free map F of resources to execute the job to be scheduled, said free map F indicative of the interconnection of resources available for scheduling to the job in a current scheduling cycle based on the status information signals, the candidate host list, and the machine map M; wherein the scheduling unit also indicates resources which have a status other than free and which the scheduling unit has determined are available to the topology library unit for scheduling the job being scheduled if non-trivial scheduling is utilized; wherein the topology library unit initially determines a free map F of resources to execute the job being scheduled by removing from the machine map M all resources which fall within the at least one global status map Yn, and then re-introducing specific resources in the at least one global status map Yn to which the scheduling unit has indicated the job being scheduled can be scheduled; wherein the topology monitoring unit receives the resource requirements of the job to be scheduled from the scheduling unit and receives the free map F from the topology library unit, and dispatches a job to the resources in the free map F which match the resource mapping requirements of the job and which fall within the free map F as determined by the topology library unit based on the candidate host list, the machine map M, the status information signals, and the specific resources in the at least one global status map Yn to which the scheduling unit has indicated the job being scheduled can be scheduled; and wherein the candidate host list for the job to be scheduled further indicates resources within the suspension map S which satisfy predetermined requirements for pre-empting a resource in the suspension map S that has been suspended, and wherein the candidate host list does not include resources in the suspension map S which do not satisfy the predetermined requirements for pre-empting a resource in the suspension map S that has been suspended.
-
-
16. A computer system having a scheduling system for implementing non-trivial scheduling requiring more than one scheduling cycle to schedule a job having resource mapping requirements to resources in a computing architecture arranged at least in part on node boards in host computers, each node board having at least one central processor unit (CPU) and shared memory, said node boards being interconnected into groups of node boards providing access between the central processing units (CPUs) and shared memory on different node boards, said computer system comprising:
-
a processor for executing computing instructions; memory for storing said computing instructions; a scheduling system associated with the processor and the memory, and comprising; a scheduling unit for scheduling jobs to at least some of said resources, said scheduling unit generating a candidate host list representing the resources available to execute the job to be scheduled based on resource requirements of the job to be scheduled; a topology library unit comprising a machine map M of the computer system, said machine map M indicative of the interconnections of the resources available to a scheduling system for scheduling the jobs, the topology library unit further comprising at least one global status map Yn, each said global status map Yn indicative of interconnections of resources available to the scheduling unit for scheduling jobs if non-trivial scheduling is utilized, and each said global status may Yn comprising a reservation map R indicative of interconnections of resources which have been reserved; a topology monitoring unit for monitoring a status of the resources and generating status information signals indicative of a status of the resources; wherein the topology library unit receives the status information signals and the candidate host lists, and determines a free map F of resources to execute the job to be scheduled, said free map F indicative of the interconnection of resources available for scheduling to the job in a current scheduling cycle based on the status information signals, the candidate host list, and the machine map M; wherein the scheduling unit also indicates resources which have a status other than free and which the scheduling unit has determined are available to the topology library unit for scheduling the job being scheduled if non-trivial scheduling is utilized; wherein the topology library unit initially determines a free map F of resources to execute the job being scheduled by removing from the machine map M all resources which fall within the at least one global status map Yn, and then re-introducing specific resources in the at least one global status map Yn to which the scheduling unit has indicated the job being scheduled can be scheduled; wherein the topology monitoring unit receives the resource requirements of the job to be scheduled from the scheduling unit and receives the free map F from the topology library unit, and dispatches a job to the resources in the free map F which match the resource mapping requirements of the job and which fall within the free map F as determined by the topology library unit based on the candidate host list, the machine map M, the status information signals, and the specific resources in the at least one global status map Yn to which the scheduling unit has indicated the job being scheduled can be scheduled; and wherein the candidate host list for the job to be scheduled indicates resources within the reservation map R which satisfy predetermined requirements for backfilling a resource in the reservation map R that has been reserved, and wherein the candidate host list does not include resources in the reservation map R which do not satisfy the predetermined requirements for backfilling a resource in the reservation map R that has been reserved.
-
-
17. A method of scheduling a job to resources in a computer system arranged at least in part on node boards, each node board having at least one central processing unit (CPU), said node boards being interconnected on host computers in the computer system to provide access between the central processing units (CPUs) on different boards, the method implementing non-trivial scheduling requiring more than one scheduling cycle to schedule a job, and comprising:
-
(a) determining a machine map M of the computer system indicative of all the interconnections of all the resources to which the scheduling system can schedule jobs, and storing the machine map M in a topology library unit; (b) assessing at a topology monitoring unit a free map F of resources indicative of the interconnections of all the resources available to the scheduling unit for scheduling a job in a current scheduling cycle without using non-trivial scheduling; (c) assessing at the topology monitoring unit at least one global status map Yn of resources indicative of the interconnections of all the resources available to the scheduling unit for scheduling a job utilizing non-trivial scheduling, wherein the at least one global status map Yn comprises a suspension map S indicative of interconnections of resources which have been suspended; (d) matching resource requirements to execute a job currently being scheduled, and generating a candidate host list indicative of the resources available for scheduling to the job in the current scheduling cycle, and also indicative of the resources available for scheduling to the job using non-trivial scheduling; (e) matching the job to be scheduled, including topological requirements of the job, to the map of resources in the free map F which match the topological resource requirements of the job, said free map F based on the machine map M but excluding the at least one global status maps Yn; (f) if the topological requirements of the job cannot be scheduled to the free map F, modifying the free map F by including in the free map F resources in the at least one global status maps Yn to which the candidate host list has indicated the job can be scheduled utilizing non-trivial scheduling; (g) matching the resource requirements, including the topological requirements of the job, to the map of resources in the free map F which has been modified to exclude the resources in the at least one global status map Yn, but to include resources which the candidate host list indicates are available for scheduling to the job utilizing non-trivial scheduling; and (h) receiving at the topology monitoring unit the resource requirements of the job to be scheduled from the scheduling unit and receiving the free map F from the topology library unit, the topology monitoring unit dispatching the job to the matched resources, wherein the candidate host list for the job being scheduled indicates resources within the suspension map S which satisfy predetermined requirements for pre-empting a resource in the suspension map S that has been suspended, and wherein the candidate host list does not include resources in the suspension map S which do not satisfy the predetermined requirements for pre-empting a resource in the suspension map S that has been suspended.
-
-
18. A method of scheduling a job to resources in a computer system arranged at least in part on node boards, each node board having at least one central processing unit (CPU), said node boards being interconnected on host computers in the computer system to provide access between the central processing units (CPUs) on different boards, the method implementing non-trivial scheduling requiring more than one scheduling cycle to schedule a job, and comprising:
-
(a) determining a machine map M of the computer system indicative of all the interconnections of all the resources to which the scheduling system can schedule jobs, and storing the machine map M in a topology library unit; (b) assessing at a topology monitoring unit a free map F of resources indicative of the interconnections of all the resources available to the scheduling unit for scheduling a job in a current scheduling cycle without using non-trivial scheduling; (c) assessing at the topology monitoring unit at least one global status map Yn of resources indicative of the interconnections of all the resources available to the scheduling unit for scheduling a job utilizing non-trivial scheduling, wherein the at least one global status map Yn comprises a reservation map R indicative of interconnections of resources which have been reserved; (d) matching resource requirements to execute a job currently being scheduled, and generating a candidate host list indicative of resources available for scheduling to the job in the current scheduling cycle, and also indicative of resources available for scheduling to the job using non-trivial scheduling; (e) matching the job to be scheduled, including topological requirements of the job, to the map of resources in the free map F which match the topological resource requirements of the job, said free map F based on the machine map M but excluding the at least one global status maps Yn; (f) if the topological requirements of the job cannot be scheduled to the free map F, modifying the free map F by including in the free map F resources in the at least one global status maps Yn which the candidate host list has indicated are available for scheduling to the job utilizing non-trivial scheduling; (g) matching the resource requirements, including the topological requirements of the job, to the map of resources in the free map F which has been modified to exclude the resources in the at least one global status map Yn, but to include resources to which the candidate host list indicates the job can be scheduled utilizing non-trivial scheduling; and (h) receiving at the topology monitoring unit the resource requirements of the job to be scheduled from the scheduling unit and receiving the free map F from the topology library unit, the topology monitoring unit dispatching the job to the matched resources, wherein the candidate host list for the job being scheduled indicates resources within the reservation map R which satisfy predetermined requirements for backfilling a resource in the reservation map R that has been reserved, and wherein the candidate host list does not include resources in the reservation map R which do not satisfy the predetermined requirements for backfilling a resource in the reservation map R that has been reserved.
-
Specification