Support of Non-Trivial Scheduling Policies Along with Topological Properties
First Claim
1. A computer system having a scheduling system to schedule a job having resource mapping requirements to resources in a computing architecture arranged at least in part on node boards in host computers, each node board having at least one central processor unit (CPU) and shared memory, said node boards being interconnected into groups of node boards providing access between the central processing units (CPUs) and shared memory on different node boards, said computer system comprising:
- a processor for executing computing instructions;
memory for storing said computing instructions;
a scheduling system associated with the processor and the memory, and comprising;
a scheduling unit for scheduling jobs to at least some of said resources, said scheduling unit generating a candidate host list representing the resources available to execute the job to be scheduled based on resource requirements of the job to be scheduled;
a topology library unit comprising a machine map M of the computer system, said machine map M indicative of the interconnections of the resources to which the scheduling system can schedule the jobs;
a topology monitoring unit for monitoring a status of the resources and generating status information signals indicative of a status of the resources;
wherein the topology library unit receives the status information signals and the candidate host list and determines a free map F of resources to execute the job to be scheduled, said free map F indicative of the interconnection of the resources to which the job in a current scheduling cycle can be scheduled based on the status information signals, the candidate host list and the machine map M; and
wherein the topology monitoring unit dispatches a job to the resources in the free map F which match the resource mapping requirements of the job.
5 Assignments
0 Petitions
Accused Products
Abstract
A system and method for scheduling jobs in a multiprocessor machine is disclosed. The status of resources, including CPUs on node boards and associated shared memory in the multiprocessor machine is periodically determined. The status can indicate the resources available to execute jobs. This information is accumulated by the topology-monitoring unit and provided to the topology library. The topology library also receives a candidate host list from the scheduling unit which lists all of the resources available to execute the job being scheduled based on non-trivial scheduling. The topology library unit then uses this to generate a free map F indicative of the interconnection of the resources available to execute the job. The topology monitoring unit then matches the jobs to the resources available to execute the jobs, based on resource requirements including shape requirements indicative of interconnections of resources required to execute the job. The topology monitoring unit dispatches the job to the portion of the free map F which match the shape requirements of the job. If the topology library unit determines that no resources are available to execute the job, the topology library unit will return the job to the scheduling unit and the scheduling unit which will wait until the resources become available. The free map F may include resources which have been suspended or reserved in previous scheduling cycles, provided the job to be scheduled satisfies the predetermined criteria for execution of the job on the suspended, have a lower priority, or are reserved resources.
-
Citations
16 Claims
-
1. A computer system having a scheduling system to schedule a job having resource mapping requirements to resources in a computing architecture arranged at least in part on node boards in host computers, each node board having at least one central processor unit (CPU) and shared memory, said node boards being interconnected into groups of node boards providing access between the central processing units (CPUs) and shared memory on different node boards, said computer system comprising:
-
a processor for executing computing instructions; memory for storing said computing instructions; a scheduling system associated with the processor and the memory, and comprising; a scheduling unit for scheduling jobs to at least some of said resources, said scheduling unit generating a candidate host list representing the resources available to execute the job to be scheduled based on resource requirements of the job to be scheduled; a topology library unit comprising a machine map M of the computer system, said machine map M indicative of the interconnections of the resources to which the scheduling system can schedule the jobs; a topology monitoring unit for monitoring a status of the resources and generating status information signals indicative of a status of the resources; wherein the topology library unit receives the status information signals and the candidate host list and determines a free map F of resources to execute the job to be scheduled, said free map F indicative of the interconnection of the resources to which the job in a current scheduling cycle can be scheduled based on the status information signals, the candidate host list and the machine map M; and wherein the topology monitoring unit dispatches a job to the resources in the free map F which match the resource mapping requirements of the job. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. In a computer system comprising resources arranged at least in part in on nodes boards, each node board having at least one central processor unit (CPU) and shared memory, said node boards being interconnected to provide access between the central processing units (CPUs) and shared memory on different boards, a method of scheduling a job to said resources comprising:
-
(a) determining a machine map M of the computer system indicative of all of the interconnections of all of the resources in the computer system to which the scheduling system can schedule jobs and storing the machine map M in a topology library unit; (b) periodically assessing a status of the resources and sending status information signals indicative of the status of the resources to the topology library unit; (c) assessing at the topology monitoring unit a free map F of resources indicative of the interconnection of all of resources to which the scheduling unit can schedule a job in a current scheduling cycle; (d) matching resource requirements, including topological requirements specifying at least one interconnection of the resources required to execute the job currently being scheduled, to resources in the free map F which match the resource requirements of the job; and (e) dispatching the job to the matched resources
-
-
15. A computer system having a scheduling system to schedule a job having resource mapping requirements to resources in a computing architecture arranged at least in part on node boards in host computers, each node board having at least one central processor unit (CPU) and shared memory, said node boards being interconnected into groups of node boards providing access between the central processing units (CPUs) and shared memory on different node boards, said computer system comprising:
-
a processor for executing computing instructions; memory for storing said computing instructions; a scheduling system associated with the processor and the memory, and comprising; a scheduling unit for scheduling jobs to resources, said scheduling unit generating a candidate host list representing the resources in the host computers available to execute the job to be scheduled based on resource requirements of the job to be scheduled; a topology library unit comprising a machine map M of the computer system, said machine map M indicative of the interconnections of the resources in the computer system to which a scheduling system can schedule the jobs, and, at least one global status map Yn, each said global status map Yn indicative of interconnections of resources in the computer system to which the scheduling unit can schedule jobs if non-trivial scheduling is utilized; a topology monitoring unit for monitoring a status of the resources in the host computers and generating status information signals indicative of a status of the resources; wherein the topology library unit receives the status information signals and the candidate host lists and determines a free map F of resources to execute the job to be scheduled, said free map indicative of the interconnection of resources to which the job in a current scheduling cycle can be scheduled based on the status information signals, the candidate host list and the machine map M; wherein the scheduling unit also indicates resources which have a status other than free and to which the scheduling unit has determined the topology library unit may schedule the job being scheduled if non-trivial scheduling is utilized; wherein the topology library unit initially determines a free map F of resources to execute the job being scheduled by removing from the machine map M all resources which fall within the at least one global status map Yn, and then re-introducing specific resources in the at least one global status map Yn which the scheduling unit has indicated the job being scheduled can be scheduled to; and wherein the topology monitoring unit dispatches a job to the resources in the free map F which matches the resource mapping requirements of the job and falls within the free map F as determined by the topology library unit based on the candidate host list, the machine map M the status information signals and the specific resources in the at least one global status map Yn which the scheduling unit has indicated the job being scheduled can be scheduled to.
-
-
16. In a computer system comprising resources arranged at least in part on node boards, each node board having at least one central processing unit (CPU), said node boards being interconnected on host computers in the computer system to provide access between the central processing units (CPUs) on different boards, a method of scheduling a job to the resources implementing non-trivial scheduling which comprises scheduling requiring more than one scheduling cycle to schedule a job, said method comprising:
-
(a) determining a machine map M of the computer system indicative of all the interconnections of all the resources in the computer system to which the scheduling system can schedule jobs and storing the machine map M in a topology library unit; (b) assessing at a topology monitoring unit a free map F of resources indicative of the interconnection of all resources to which the scheduling unit can schedule a job in a current scheduling cycle without using non-trivial scheduling; (c) assessing at the topology monitoring unit at least one global status map Yn of resources indicative of the interconnection of all resources to which the scheduling unit can schedule a job utilizing non-trivial scheduling; (d) matching resource requirements to execute a job currently being scheduled and generating a candidate host list indicative of resources to which the job can be scheduled in the current scheduling cycle, and also indicative of resources to which the job can be scheduled using non-trivial scheduling; (e) matching the job to be scheduled, including topological requirements of the job, to the free map F which match the topological resource requirements of the job, said free map F based on the machine map M but excluding the at least one global status maps Yn; (f) if the topological requirements of the job cannot be scheduled to the free map F, modifying the free map F by including in the free map F resources in the at least one global status maps Yn which the candidate host list has indicated the job can be scheduled utilizing non-trivial scheduling; (g) matching the resource requirements including the topological requirements of the job to the map of resources in the free map F which has been modified to exclude the resources in the at least one global status map Yn, but to include resources which the candidate host list indicates the job can be scheduled to utilizing non-trivial scheduling; and (h) dispatching the job to the matched resources.
-
Specification