System and method for multi-level preemption scheduling in high performance processing

US 8,458,712 B2
Filed: 04/30/2008
Issued: 06/04/2013
Est. Priority Date: 04/30/2008
Status: Active Grant

First Claim

Patent Images

1. A method for managing preemption events in a backfill enabled computing system, the method comprising:

suspending a first low priority job running on one or more nodes of a node cluster upon receipt of a first high priority job until the nodes the first low priority job was running on become available;

running the first high priority job on the one or more nodes of the node cluster;

selecting a second low priority job from a job queue, the second low priority job having a position in the job queue;

running the second low priority job on available nodes of the node cluster while the first high priority job is running;

receiving a request for a second high priority job after the second low priority job has started running;

determining a processing status for the second low priority job;

determining that the processing status of the second low priority job exceeds a predetermined checkpoint threshold;

saving processing performed on the second low priority job in the event the processing status exceeds the predetermined checkpoint threshold;

returning, after receiving the request for the second high priority job, the second low priority job to a job queue in the position in the job queue; and

running the first low priority job and the second low priority job after the first high priority job and the second high priority job are complete.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A computing system configured to handle preemption events in an environment having jobs with high and low priorities. The system includes a job queue configured to receive job requests from users, the job queue storing the jobs in an order based on the priority of the jobs, and indicating whether a job is a high priority job or a low priority job. The system also includes a plurality of node clusters, each node cluster including a plurality of nodes and a scheduler coupled to the job queue and to the plurality of node clusters and configured to assign jobs from the job queue to the plurality of node clusters. The scheduler is configured to preempt a first low priority job running in a first node cluster with a high priority job that appears in the job queue after the low priority job has started and, in the event that a second low priority job from the job queue may run on a portion of the plurality of nodes in the first node cluster during a remaining processing time for the high priority job, backfill the second low priority job into the portion of the plurality of nodes and, in the event a second high priority job is received in the job queue and may run on the portion of the plurality of nodes, return the second low priority job to the job queue.

Citations

7 Claims

1. A method for managing preemption events in a backfill enabled computing system, the method comprising:
- suspending a first low priority job running on one or more nodes of a node cluster upon receipt of a first high priority job until the nodes the first low priority job was running on become available;
  
  running the first high priority job on the one or more nodes of the node cluster;
  
  selecting a second low priority job from a job queue, the second low priority job having a position in the job queue;
  
  running the second low priority job on available nodes of the node cluster while the first high priority job is running;
  
  receiving a request for a second high priority job after the second low priority job has started running;
  
  determining a processing status for the second low priority job;
  
  determining that the processing status of the second low priority job exceeds a predetermined checkpoint threshold;
  
  saving processing performed on the second low priority job in the event the processing status exceeds the predetermined checkpoint threshold;
  
  returning, after receiving the request for the second high priority job, the second low priority job to a job queue in the position in the job queue; and
  
  running the first low priority job and the second low priority job after the first high priority job and the second high priority job are complete.
- View Dependent Claims (2, 3)
- - 2. The method of claim 1, further comprising:
    - running the second high priority job on available nodes of the node cluster.
  - 3. The method of claim 1, wherein the predetermined checkpoint threshold is an indication of the portion of the total process performed on the second low priority job.

4. A method of managing the operation of computing system including a plurality of node clusters, each node cluster including a plurality of nodes, the method comprising:
- allocating a first low priority job to run on an a first set of the nodes in a first node cluster;
  
  running the first low priority job on the first set of nodes;
  
  receiving, at a job queue, a first high priority job;
  
  suspending the first low priority job until the first set of nodes becomes available;
  
  running the first high priority job on a second set of nodes that includes at least one of the nodes in the first set of nodes in the first node cluster for a predetermined amount of time;
  
  selecting a second low priority job from the job queue;
  
  running the second low priority job on a third set of nodes in the first node cluster;
  
  receiving a second high priority job on the job queue after the second low priority job has started running;
  
  determining a processing status for the second low priority job;
  
  determining that the processing status of the second low priority job exceeds a predetermined checkpoint threshold;
  
  saving processing performed on the second low priority job in the event the processing status exceeds the predetermined checkpoint threshold;
  
  returning the second low priority job to the job queue; and
  
  running the first low priority job and the second low priority job after the first high priority job and second high priority job are complete.
- View Dependent Claims (5, 6, 7)
- - 5. The method of claim 4, wherein selecting the second low priority job includes determining a completion time for the second low priority job and determining a number of nodes required to run the second low priority job.
  - 6. The method of claim 5, wherein determining includes determining that the completion time is shorter than a time remaining to complete the first high priority job.
  - 7. The method of claim 6, wherein determining includes determining that the number of nodes required is less than a difference between a total number of nodes in the plurality of nodes and a number of nodes in the second set of nodes.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
International Business Machines Corporation
Original Assignee
International Business Machines Corporation
Inventors
Chan, Waiman, Skovira, Joseph F.
Primary Examiner(s)
Puente, Emerson
Assistant Examiner(s)
Paul, Hiren

Application Number

US12/112,462
Publication Number

US 20090276781A1
Time in Patent Office

1,861 Days
Field of Search

None
US Class Current

718/103
CPC Class Codes

G06F 2209/483 Multiproc

G06F 9/4881 Scheduling strategies for d...

System and method for multi-level preemption scheduling in high performance processing

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

7 Claims

Specification

Solutions

Use Cases

Quick Links

System and method for multi-level preemption scheduling in high performance processing

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

7 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links