Computing on transient resources
First Claim
1. A computing system, the computing system comprising:
- one or more hardware processors and computer storage media storing computer-executable instructions and components that, when executed, by the one or more hardware processors, cause the one or more hardware processors to execute;
a task scheduler configured for;
accessing instability information of a transient resource and information of a stage of a computational job, the instability information associated with an estimated lifetime availability of the transient resource, and the stage having a plurality of parallel tasks; and
scheduling a task of the plurality of parallel tasks to use the transient resource based at least in part on a rate of data size reduction of the task; and
a checkpointing scheduler, coupled to the task scheduler, configured for;
determining a checkpointing plan for the task based at least in part on a recomputation cost associated with the instability information of the transient resource, wherein the instability information comprises the estimated lifetime availability of the transient resource.
1 Assignment
0 Petitions
Accused Products
Abstract
Aspects of the technology described herein can facilitate computing on transient resources. An exemplary computing device may use a task scheduler to access information of a computational task and instability information of a transient resource. Moreover, the task scheduler can schedule the computational task to use the transient resource based at least in part on the rate of data size reduction of the computational task. Further, a checkpointing scheduler in the exemplary computing device can determine a checkpointing plan for the computational task based at least in part on a recomputation cost associated with the instability information of the transient resource. Resultantly, the overall utilization rate of computing resources is improved by effectively utilizing transient resources.
-
Citations
20 Claims
-
1. A computing system, the computing system comprising:
-
one or more hardware processors and computer storage media storing computer-executable instructions and components that, when executed, by the one or more hardware processors, cause the one or more hardware processors to execute; a task scheduler configured for; accessing instability information of a transient resource and information of a stage of a computational job, the instability information associated with an estimated lifetime availability of the transient resource, and the stage having a plurality of parallel tasks; and scheduling a task of the plurality of parallel tasks to use the transient resource based at least in part on a rate of data size reduction of the task; and
a checkpointing scheduler, coupled to the task scheduler, configured for;determining a checkpointing plan for the task based at least in part on a recomputation cost associated with the instability information of the transient resource, wherein the instability information comprises the estimated lifetime availability of the transient resource. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer-implemented method for transient resource computing, the method comprising:
-
accessing information of a plurality of parallel tasks; determining a rate of data size reduction of a task of the plurality of parallel tasks based on an estimated execution time of the task, an input data size of the task, and an output data size of the task; and scheduling the task to use a transient resource based at least in part on the rate of data size reduction of the task being greater than rates of data-size reduction of other tasks in the plurality of parallel tasks. - View Dependent Claims (14, 15)
-
-
16. One or more non-transient computer storage media comprising computer-implemented instructions that, when used by one or more computing devices, cause the one or more computing devices to:
-
access a task running on a transient resource and an output data block of the task; and checkpoint the task running on the transient resource based on; determining that a residual lifetime of the transient resource is shorter than a required remaining time to complete the task on the transient resource, wherein the residual lifetime indicates a remaining available usage time of the transient resource; and determining that a recomputation cost to recompute the task is greater than a backup cost to backup the output data block of the task. - View Dependent Claims (17, 18, 19, 20)
-
Specification