Recovery segments
First Claim
1. A method for implementing recovery segments in a large scale computing application comprising:
- sending an application message from a parent process executed by a first computing device to a child process executed by a second computing device, in which the recovery segment comprises the parent process and the child process;
identifying a dependency created by the application message;
including the identified dependency in a dependence set of the child process and saving the dependence set in memory of the second computing device;
generating, by the parent process, a first checkpoint and saving the first checkpoint in nonvolatile memory of the first computing device;
sending, from the parent process to a child process, a checkpoint message that includes dependency information;
receiving, by the child process, the checkpoint message and modifying the dependence set of the child process according to the dependency information;
generating, by the child process, a second checkpoint and saving the second checkpoint in nonvolatile memory of the second computing device;
upon occurrence of a failure of the parent process, reverting the child process to a most recent checkpoint generated by the child process that does not include effects of processing an orphan message.
2 Assignments
0 Petitions
Accused Products
Abstract
In one example, a method for implementing recovery segments includes sending an application message from a parent process executed by a first computing device to a child process executed by a second computing device and identifying a dependency created by the application message. This identified dependency is included in a dependence set of the child process and saved. A checkpoint is generated by the parent process and a checkpoint message that includes dependency information is sent from the parent process to the child process. The child process modifies the dependence set according to the dependency information and generates a second checkpoint that is saved in nonvolatile memory of the second computing device. Upon occurrence of a failure of the parent process, the child process reverts to a most recent checkpoint generated by the child process that does not include the effects of processing an orphan message.
21 Citations
14 Claims
-
1. A method for implementing recovery segments in a large scale computing application comprising:
-
sending an application message from a parent process executed by a first computing device to a child process executed by a second computing device, in which the recovery segment comprises the parent process and the child process; identifying a dependency created by the application message; including the identified dependency in a dependence set of the child process and saving the dependence set in memory of the second computing device; generating, by the parent process, a first checkpoint and saving the first checkpoint in nonvolatile memory of the first computing device; sending, from the parent process to a child process, a checkpoint message that includes dependency information; receiving, by the child process, the checkpoint message and modifying the dependence set of the child process according to the dependency information; generating, by the child process, a second checkpoint and saving the second checkpoint in nonvolatile memory of the second computing device; upon occurrence of a failure of the parent process, reverting the child process to a most recent checkpoint generated by the child process that does not include effects of processing an orphan message. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A method for implementing recovery segments in a coordinated application-unaware failure recovery for large scale computing application, the method comprising:
-
sending an application message from a parent process to a child process within the application, in which the parent process and child process are executed on at least one computing device in which the recovery segment comprises the parent process and the child process; identifying a dependency of the child state on the parent process as a result of the application message sent from the parent process to the child process; including the identified dependency in a dependence set of the child process, in which the dependence set comprises a list of all the current dependencies of the child process and is recorded in memory the at least one computing device; generating, by the parent process, a checkpoint by recording state information of the parent process at the time of the checkpoint on nonvolatile memory of at least one computing device such that the parent process can retrieve the state information and revert back to a state at the time of the checkpoint; sending, from the parent process to a child process, a checkpoint message that includes dependency information comprising dependencies to add to the dependence set of the child process and dependencies to remove from the dependence set of the child process for the parent; receiving, by the child process, the checkpoint message and modifying the dependence set of the child process for the parent according to the dependency information; generating, by the child process, a checkpoint by recording state information of the child process at the time of the checkpoint such that the child process can retrieve the state information and revert back to a state at the time of the checkpoint, in which the checkpoint generated by the child process is triggered by receipt of the checkpoint message; sending an additional checkpoint message from the child process to processes that are downstream from the child process; if the dependence set of the child process does not comprise any dependencies, releasing outside variables of the child process; when the parent process fails, sending a recovery message to the child process; reverting the child process to a most recent checkpoint generated by the child process that does not include effects of processing an orphan message; and sending an additional recovery message from the child process to processes that are downstream from the child process.
-
Specification