Method and system for providing checkpointing to windows application groups
First Claim
1. A computer system for performing checkpointing of an application group, the system comprising:
- one or more central processing units (CPUs); and
a memory storing computer-executable instructions which, when executed by the one or more CPUs, cause the system to perform a method of checkpointing an application group, the method comprising;
loading a checkpoint kernel module and registering a coordinator process with the checkpoint kernel module;
launching a plurality of independent applications, each including a plurality of threads, as the application group via the coordinator, wherein launching an application via the coordinator causes the launched application to load a user-space checkpoint library;
wherein an application loading the checkpoint library comprises installing a plurality of function interceptors including at least wait function interceptors which together are configured to intercept system calls made by the threads of the loading application during execution, and, in response to intercepting a system call from a calling thread and determining the coordinator has initiated a group checkpoint, are configured to cause the calling thread to block and wait in an alertable state;
initiating a group checkpoint of the application group via the coordinator including causing each of the independent applications to perform an application checkpoint, wherein performing an application checkpoint includes sending a kernel-mode checkpoint signal to each thread of the application; and
responsive to receiving the kernel-mode checkpoint signal, the receiving thread performs steps comprising;
entering a checkpoint signal handler in the kernel module;
determining the receiving thread was active in kernel-space at the time the kernel-mode signal was received utilizing kernel attributes directly accessible from the checkpoint signal handler; and
inserting a user-mode checkpoint signal at the front of a signal queue of the receiving thread and returning from the checkpoint signal handler in response to the determining that the receiving thread was active in kernel-space, wherein a user-mode signal is only processed by a signaled thread when the signaled thread waits in the alertable state, and wherein the user-mode checkpoint signal is configured to cause the thread to enter a user-mode signal handler in the checkpoint library when processed.
1 Assignment
0 Petitions
Accused Products
Abstract
A method and system of checkpointing multi-threaded applications, and multi-process application groups on Windows operating systems. In an exemplary embodiment, the method may include creating at least one full checkpoint for each application in an application group, and creating at least one incremental application checkpoint for each application in the application group. Further, each of the at least one incremental application checkpoint may be automatically merged against a corresponding full application checkpoint. Further, checkpointing may be synchronized across all applications in the application group. Further, checkpointing may be configured to perform live migration. In the exemplary embodiment, checkpoints are triggered asynchronously using Asynchronous Procedure Calls (APC).
87 Citations
10 Claims
-
1. A computer system for performing checkpointing of an application group, the system comprising:
-
one or more central processing units (CPUs); and a memory storing computer-executable instructions which, when executed by the one or more CPUs, cause the system to perform a method of checkpointing an application group, the method comprising; loading a checkpoint kernel module and registering a coordinator process with the checkpoint kernel module; launching a plurality of independent applications, each including a plurality of threads, as the application group via the coordinator, wherein launching an application via the coordinator causes the launched application to load a user-space checkpoint library; wherein an application loading the checkpoint library comprises installing a plurality of function interceptors including at least wait function interceptors which together are configured to intercept system calls made by the threads of the loading application during execution, and, in response to intercepting a system call from a calling thread and determining the coordinator has initiated a group checkpoint, are configured to cause the calling thread to block and wait in an alertable state; initiating a group checkpoint of the application group via the coordinator including causing each of the independent applications to perform an application checkpoint, wherein performing an application checkpoint includes sending a kernel-mode checkpoint signal to each thread of the application; and responsive to receiving the kernel-mode checkpoint signal, the receiving thread performs steps comprising; entering a checkpoint signal handler in the kernel module; determining the receiving thread was active in kernel-space at the time the kernel-mode signal was received utilizing kernel attributes directly accessible from the checkpoint signal handler; and inserting a user-mode checkpoint signal at the front of a signal queue of the receiving thread and returning from the checkpoint signal handler in response to the determining that the receiving thread was active in kernel-space, wherein a user-mode signal is only processed by a signaled thread when the signaled thread waits in the alertable state, and wherein the user-mode checkpoint signal is configured to cause the thread to enter a user-mode signal handler in the checkpoint library when processed. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
Specification