COMPUTING ON TRANSIENT RESOURCES
First Claim
1. A computing system, the computing system comprising:
- a task scheduler configured for;
accessing instability information of a transient resource and information of a stage of a computational job, the instability information associated with an estimation of availability of the transient resource, and the stage having a plurality of parallel tasks; and
scheduling a task of the plurality of parallel tasks to use the transient resource based at least in part on a rate of data size reduction of the task; and
a checkpointing scheduler, coupled to the task scheduler, configured for;
determining a checkpointing plan for the task based at least in part on a recomputation cost associated with the instability information of the transient resource.
1 Assignment
0 Petitions
Accused Products
Abstract
Aspects of the technology described herein can facilitate computing on transient resources. An exemplary computing device may use a task scheduler to access information of a computational task and instability information of a transient resource. Moreover, the task scheduler can schedule the computational task to use the transient resource based at least in part on the rate of data size reduction of the computational task. Further, a checkpointing scheduler in the exemplary computing device can determine a checkpointing plan for the computational task based at least in part on a recomputation cost associated with the instability information of the transient resource. Resultantly, the overall utilization rate of computing resources is improved by effectively utilizing transient resources.
-
Citations
20 Claims
-
1. A computing system, the computing system comprising:
-
a task scheduler configured for; accessing instability information of a transient resource and information of a stage of a computational job, the instability information associated with an estimation of availability of the transient resource, and the stage having a plurality of parallel tasks; and scheduling a task of the plurality of parallel tasks to use the transient resource based at least in part on a rate of data size reduction of the task; and a checkpointing scheduler, coupled to the task scheduler, configured for; determining a checkpointing plan for the task based at least in part on a recomputation cost associated with the instability information of the transient resource. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer-implemented method for transient resource computing, the method comprising:
-
accessing information of a plurality of parallel tasks; determining a rate of data size reduction of a task of the plurality of parallel tasks based on an estimated execution time of the task, an input data size of the task, and an output data size of the task; and scheduling the task to use a transient resource based at least in part on the rate of data size reduction of the task being greater than rates of data-size reduction of other tasks in the plurality of parallel tasks. - View Dependent Claims (14, 15)
-
-
16. One or more non-transient computer storage media comprising computer-implemented instructions that, when used by one or more computing devices, cause the one or more computing devices to:
-
access a task running on a transient resource and an output data block of the task; determine to checkpoint the task based on (a) a residual lifetime of the transient resource is shorter than a required remaining time to complete the task, and (b) a recomputation cost to recompute the task is greater than a backup cost to back up the output data block of the task; and checkpoint the task. - View Dependent Claims (17, 18, 19, 20)
-
Specification