RESILIENT PROGRAMMING FRAMEWORKS FOR HANDLING FAILURES IN PARALLEL PROGRAMS
First Claim
1. A method for supporting resilient execution of computer programs comprising the steps of:
- providing a resilient store wherein information in the resilient store can be accessed in the event of a failure;
periodically checkpointing application state in the resilient store;
providing a resilient executor which comprises software which executes applications by catching failures;
using the resilient executor to execute at least one application; and
in response to the resilient executor detecting a failure, restoring application state information from a checkpoint in the resilient store, the resilient executor resuming execution of the at least one application.
1 Assignment
0 Petitions
Accused Products
Abstract
An information processing system, computer readable storage medium, and method for supporting resilient execution of computer programs. A method provides a resilient store wherein information in the resilient store can be accessed in the event of a failure. The method periodically checkpoints application state in the resilient store. A resilient executor comprises software which executes applications by catching failures. The method uses the resilient executor to execute at least one application. In response to the resilient executor detecting a failure, restoring application state information to the at least one application from a checkpoint stored in the resilient store, the resilient executor resuming execution of the at least one application with the restored application state information.
-
Citations
14 Claims
-
1. A method for supporting resilient execution of computer programs comprising the steps of:
-
providing a resilient store wherein information in the resilient store can be accessed in the event of a failure; periodically checkpointing application state in the resilient store; providing a resilient executor which comprises software which executes applications by catching failures; using the resilient executor to execute at least one application; and in response to the resilient executor detecting a failure, restoring application state information from a checkpoint in the resilient store, the resilient executor resuming execution of the at least one application. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
Specification