Method and system for preemptible coprocessing
First Claim
Patent Images
1. A method comprising:
- executing at least a portion of a first compute job;
executing at least a portion of a second compute job, whereinthe at least the portion of the first compute job and the at least the portion of the second compute job are configured to be executed at a first compute node comprising a first computing resource and a second computing resource,the first computing resource comprises a first hardware element,the second computing resource comprises a second hardware element,the first hardware element and the second hardware element are separate from one another,the at least the portion of the first compute job is serviced by the first computing resource, andthe at least the portion of the second compute job is serviced by the second computing resource;
prior to completing execution of the at least the portion of the second compute job, interrupting the execution of the second compute job;
scheduling at least a portion of a third compute job, whereinthe at least the portion of the third compute job is scheduled to be serviced by the second computing resource;
in response to the interrupting, detecting a failure during the execution of the at least the portion of the second compute job; and
restarting servicing of the at least the portion of the second compute job, whereinthe restarting is accomplished by causing another computing resource to service the at least the portion of the second compute job, andthe at least the portion of the second compute job is scheduled to be serviced by the another computing resource at a point in time at which the another computing resource becomes available.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods, computer program products, and systems supporting preemptible coprocessing are disclosed. The method includes executing at least a portion of a first compute job, and executing at least a portion of a second compute job. The method further includes, prior to completing execution of the at least the portion of the second compute job, interrupting the execution of the second compute job, and scheduling at least a portion of a third compute job.
70 Citations
18 Claims
-
1. A method comprising:
-
executing at least a portion of a first compute job; executing at least a portion of a second compute job, wherein the at least the portion of the first compute job and the at least the portion of the second compute job are configured to be executed at a first compute node comprising a first computing resource and a second computing resource, the first computing resource comprises a first hardware element, the second computing resource comprises a second hardware element, the first hardware element and the second hardware element are separate from one another, the at least the portion of the first compute job is serviced by the first computing resource, and the at least the portion of the second compute job is serviced by the second computing resource; prior to completing execution of the at least the portion of the second compute job, interrupting the execution of the second compute job; scheduling at least a portion of a third compute job, wherein the at least the portion of the third compute job is scheduled to be serviced by the second computing resource; in response to the interrupting, detecting a failure during the execution of the at least the portion of the second compute job; and restarting servicing of the at least the portion of the second compute job, wherein the restarting is accomplished by causing another computing resource to service the at least the portion of the second compute job, and the at least the portion of the second compute job is scheduled to be serviced by the another computing resource at a point in time at which the another computing resource becomes available. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer system comprising:
-
one or more processors; one or more coprocessors; a system controller, coupled to the one or more processors and the one or more coprocessors; a computer-readable storage medium coupled to the system controller; and a plurality of instructions, encoded in the computer-readable storage medium and configured to cause at least one of the one or more processors to execute at least a portion of a first compute job, cause at least one of the one or more coprocessors to execute at least a portion of a second compute job, wherein the at least the portion of the first compute job and the at least the portion of the second compute job are configured to be executed at a first compute node comprising the one or more processors and the one or more coprocessors, and the one or more processors and the one or more coprocessors are separate from one another, prior to the at least one of the one or more coprocessors completing execution of the at least the portion of the second compute job, cause interruption of the execution of the second compute job, schedule execution of at least a portion of a third compute job, wherein the at least the portion of the third compute job is scheduled to be executed by the at least one of the one or more coprocessors, in response to interruption of the execution of the second compute job, detect a failure during the execution of the at least the portion of the second compute job; and restart the execution of the at least the portion of the second compute job, wherein the execution of the at least the portion of the second compute job is restarted by causing another coprocessor to execute the at least the portion of the second compute job, and the at least the portion of the second compute job is scheduled to be executed by the another coprocessor at a point in time at which the another coprocessor becomes available. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17)
-
-
18. A computer program product comprising:
-
a plurality of instructions, wherein the plurality of instructions are configured to cause execution of a plurality of compute lobs at a first compute node that comprises one or more processors, and one or more coprocessors, the one or more processors and the one or more coprocessors are separate from one another, and the plurality of instructions comprise a first set of instructions, executable on a computer system, configured to cause at least one of the one or more processors to execute at least a portion of a first compute job, a second set of instructions, executable on the computer system, configured to cause at least one of the one or more coprocessors to execute at least a portion of a second compute job, wherein, the at least the portion of the first compute job and the at least the portion of the second compute job are configured to be executed at the first compute node, a third set of instructions, executable on the computer system, configured to, prior to the at least one of the one or more coprocessors completing execution of the at least the portion of the second compute job, cause interruption of the execution of the second compute job, a fourth set of instructions, executable on the computer system, configured to schedule execution of at least a portion of a third compute job, wherein the at least the portion of the third compute job is scheduled to be executed by the at least one of the one or more coprocessors, a fifth set of instructions, executable on the computer system, configured to, in response to the interruption of the execution of the second compute job, detect a failure during the execution of the at least the portion of the second compute job, and a fifth set of instructions, executable on the computer system, configured to restart the execution of the at least the portion of the second compute job, wherein the execution of the at least the portion of the second compute job is restarted by causing another coprocessor to execute the at least the portion of the second compute job, and the at least the portion of the second compute job is scheduled to be executed by the another coprocessor at a point in time at which the another coprocessor becomes available; and a non-transitory computer-readable storage medium, wherein the instructions are encoded in the non-transitory computer-readable storage medium.
-
Specification