SYSTEM AND METHOD FOR MULTI-LEVEL PREEMPTION SCHEDULING IN HIGH PERFORMANCE PROCESSING
First Claim
1. A computing system configured to handle preemption events in an environment having jobs with high and low priorities, the system comprising:
- a job queue configured to receive job requests from users, the job queue storing the jobs in an order based on the priority of the jobs, and indicating whether a job is a high priority job or a low priority job;
a plurality of node clusters, each node cluster including a plurality of nodes;
a scheduler coupled to the job queue and to the plurality of node clusters, the scheduler configured to assign jobs from the job queue to the plurality of node clusters, wherein the scheduler is configured to preempt a first low priority job running in a first node cluster with a high priority job that appears in the job queue after the low priority job has started and, in the event that a second low priority job from the job queue may run on a portion of the plurality of nodes in the first node cluster during a remaining processing time for the high priority job, backfill the second low priority job into the portion of the plurality of nodes and, in the event a second high priority job is received in the job queue and may run on the portion of the plurality of nodes, return the second low priority job to the job queue.
1 Assignment
0 Petitions
Accused Products
Abstract
A computing system configured to handle preemption events in an environment having jobs with high and low priorities. The system includes a job queue configured to receive job requests from users, the job queue storing the jobs in an order based on the priority of the jobs, and indicating whether a job is a high priority job or a low priority job. The system also includes a plurality of node clusters, each node cluster including a plurality of nodes and a scheduler coupled to the job queue and to the plurality of node clusters and configured to assign jobs from the job queue to the plurality of node clusters. The scheduler is configured to preempt a first low priority job running in a first node cluster with a high priority job that appears in the job queue after the low priority job has started and, in the event that a second low priority job from the job queue may run on a portion of the plurality of nodes in the first node cluster during a remaining processing time for the high priority job, backfill the second low priority job into the portion of the plurality of nodes and, in the event a second high priority job is received in the job queue and may run on the portion of the plurality of nodes, return the second low priority job to the job queue.
-
Citations
15 Claims
-
1. A computing system configured to handle preemption events in an environment having jobs with high and low priorities, the system comprising:
-
a job queue configured to receive job requests from users, the job queue storing the jobs in an order based on the priority of the jobs, and indicating whether a job is a high priority job or a low priority job; a plurality of node clusters, each node cluster including a plurality of nodes; a scheduler coupled to the job queue and to the plurality of node clusters, the scheduler configured to assign jobs from the job queue to the plurality of node clusters, wherein the scheduler is configured to preempt a first low priority job running in a first node cluster with a high priority job that appears in the job queue after the low priority job has started and, in the event that a second low priority job from the job queue may run on a portion of the plurality of nodes in the first node cluster during a remaining processing time for the high priority job, backfill the second low priority job into the portion of the plurality of nodes and, in the event a second high priority job is received in the job queue and may run on the portion of the plurality of nodes, return the second low priority job to the job queue. - View Dependent Claims (2, 3, 5, 6)
-
-
4. The system of claim 4, wherein the scheduler is coupled to the machine list and is configured to scan the machine list to determine the node clusters on which the second high priority job is to be run
-
7. A method for managing preemption events in a backfill enabled computing system, the method comprising:
-
suspending a first low priority job running on one or modes of a node cluster upon receipt of a first high priority job; running the first high priority job on one or more nodes of the node cluster; selecting a second low priority job from a job queue, the second low priority job having a position in the job queue; running the second low priority job on available nodes of the node cluster while the high priority job is running; receiving a request for a second high priority job; and returning, after receiving the request for the second high priority job, the second low priority job to a job queue in the position in the job queue. - View Dependent Claims (8, 9, 10, 11)
-
-
12. A method of managing the operation of computing system including a plurality of node clusters, each node cluster including a plurality of nodes, the method comprising:
-
allocating a first low priority job to run on an a first set of the nodes in a first node cluster; running the first low priority job on the first set of nodes; receiving, at a job queue, a first high priority job; suspending the first low priority job; running the first high priority job on a second set of nodes in the first node cluster for a predetermined amount of time; selecting a second low priority job from the job queue; running the second low priority job on a third set of nodes in the first node cluster; receiving a second high priority job on the job queue; returning the second low priority job to the job queue; and running the first low priority job after the first and second high priority jobs are complete. - View Dependent Claims (13, 14, 15)
-
Specification