Topology aware scheduling for a multiprocessor system
First Claim
1. In a computer system comprising a cluster of node boards, each node board having at least one central processor unit (CPU) and shared memory, said node boards being interconnected into groups of node boards providing access between the central processing units (CPUs) and shared memory on different node boards, a scheduling system to schedule a job to said node boards which have resources to execute the jobs, said batch scheduling system comprising:
- a topology monitoring unit for monitoring a status of the CPUs and generating status information signals indicative of the status of each group of node boards;
a job scheduling unit for receiving said status information signals and said jobs, and, scheduling the job to one group of node boards on the basis of which group of node boards have the resources required to execute the job as indicated by the status information signals.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for scheduling jobs in a multiprocessor machine is disclosed. The status of CPUs on node boards in the multiprocessor machine is periodically determined. The status can indicate the number of CPUs available, and the maximum radius of free CPUs available to execute jobs. Memory allocation is also monitored. This information is provided to a scheduler that compares the status of the resources available against the resource requirements of jobs. The node boards and CPUS, as well as other resources such as memory, are arranged in hosts. The scheduler then schedules jobs to hosts that indicate they have resources available to execute the jobs. If none of the hosts indicate they have resources available to execute the jobs, the scheduler will wait until the resources become available. A best fit of job to resources is attained by scheduling jobs to hosts that have the maximum number of free CPUs for a radius corresponding to the CPU radius requirement of a job. Once the job is scheduled to a host, it is dispatched to a host and resources required to execute the job are allocated to the job at the host.
-
Citations
21 Claims
-
1. In a computer system comprising a cluster of node boards, each node board having at least one central processor unit (CPU) and shared memory, said node boards being interconnected into groups of node boards providing access between the central processing units (CPUs) and shared memory on different node boards, a scheduling system to schedule a job to said node boards which have resources to execute the jobs, said batch scheduling system comprising:
-
a topology monitoring unit for monitoring a status of the CPUs and generating status information signals indicative of the status of each group of node boards;
a job scheduling unit for receiving said status information signals and said jobs, and, scheduling the job to one group of node boards on the basis of which group of node boards have the resources required to execute the job as indicated by the status information signals. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. In a computer system comprising resources physically located in more than one module, said resources including a plurality of processors being interconnected by a number of interconnections in a physical topology providing non-uniform access to other resources of said computer system, a method of scheduling a job to said resources, said method comprising the steps of:
-
(a) periodically assessing a status of the resources and sending status information signals indicative of the status of the resources to a job scheduling unit;
(b) assessing, at the job scheduling unit, the resources required to execute a job;
(c) comparing, at the job scheduling unit, the resources required to execute the job and resources available based on the status information signals; and
(d) scheduling the job to the resources which are available to execute the job as based on the status information signals and the physical topology, and the resources required to execute the job. - View Dependent Claims (11, 12)
-
-
13. In a computer system comprising resources including a plurality of processors, said processors being interconnected by a number of interconnections in a physical topology providing non-uniform access to other resources of said computer system, a scheduling system to schedule jobs to said resources, said scheduling system comprising:
-
a topology monitoring unit for monitoring a status of the processors and generating status information signals indicative of the status of said processors;
a job scheduling unit for receiving said status information signals and said jobs, and, scheduling the jobs to groups of processors on the basis of the physical topology and the status information signals. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
-
Specification