Resilient programming frameworks for handling failures in parallel programs
First Claim
1. An information processing system capable of supporting resilient execution of applications written in a programming language with exception handling, the information processing system comprising:
- memory;
persistent memory for storing data and computer instructions;
a resilient executor, communicatively coupled with the memory and the persistent memory, for executing computations of applications while detecting failures in the execution of the computations; and
at least one processor, communicatively coupled with the resilient executor, the memory, the persistent memory, and wherein the at least one processor, responsive to executing computer instructions, performs operations comprising;
providing an interface allowing programs to explicitly reference a place to communicate with or execute at least one computation on the place, wherein each place comprises an entity executing a computation;
providing a virtual place abstraction layer which defines a mapping between virtual places and physical places;
providing an interface allowing an application to communicate with or execute at least one computation on a physical place p1 by referencing a virtual place p2 which is mapped to physical place p1; and
in response to a physical place p3 failing, wherein virtual place p4 maps to physical place p3, updating the mapping so that virtual place p4 maps to physical place p5 wherein physical place p5 is live.
1 Assignment
0 Petitions
Accused Products
Abstract
An information processing system, computer readable storage medium, and method for supporting resilient execution of computer programs. A method provides a resilient store wherein information in the resilient store can be accessed in the event of a failure. The method periodically checkpoints application state in the resilient store. A resilient executor comprises software which executes applications by catching failures. The method uses the resilient executor to execute at least one application. In response to the resilient executor detecting a failure, restoring application state information to the at least one application from a checkpoint stored in the resilient store, the resilient executor resuming execution of the at least one application with the restored application state information.
-
Citations
20 Claims
-
1. An information processing system capable of supporting resilient execution of applications written in a programming language with exception handling, the information processing system comprising:
-
memory; persistent memory for storing data and computer instructions; a resilient executor, communicatively coupled with the memory and the persistent memory, for executing computations of applications while detecting failures in the execution of the computations; and at least one processor, communicatively coupled with the resilient executor, the memory, the persistent memory, and wherein the at least one processor, responsive to executing computer instructions, performs operations comprising; providing an interface allowing programs to explicitly reference a place to communicate with or execute at least one computation on the place, wherein each place comprises an entity executing a computation; providing a virtual place abstraction layer which defines a mapping between virtual places and physical places; providing an interface allowing an application to communicate with or execute at least one computation on a physical place p1 by referencing a virtual place p2 which is mapped to physical place p1; and in response to a physical place p3 failing, wherein virtual place p4 maps to physical place p3, updating the mapping so that virtual place p4 maps to physical place p5 wherein physical place p5 is live. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer readable storage medium, comprising computer instructions which, responsive to being executed by a processor of an information processing system, cause the processor to perform operations for supporting resilient execution of applications written in a programming language with exception handling, the information processing system comprising memory for storing data and computer instructions, and a resilient executor for executing computations of the applications while detecting failures in the execution of the computations, the operations comprising:
-
providing an interface allowing programs to explicitly reference a place to communicate with or execute at least one computation on the place, wherein each place comprises an entity executing a computation; providing a virtual place abstraction layer which defines a mapping between virtual places and physical places; providing an interface allowing an application to communicate with or execute at least one computation on a physical place p1 by referencing a virtual place p2 which is mapped to physical place p1; and in response to a physical place p3 failing, wherein virtual place p4 maps to physical place p3, updating the mapping so that virtual place p4 maps to physical place p5 wherein physical place p5 is live. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
Specification