Method and apparatus for providing process pair protection for complex applications
First Claim
1. An apparatus for providing continuous availability to complex applications through the use of process-pair protection to allow fast and stateful application failover, the apparatus comprising:
- a primary process-pair manager located on a primary computer system, the primary process-pair manager configured to startup and manage a primary instance of a complex application;
a backup process-pair manager located on a backup computer system, the backup process-pair manager configured to startup and manage a backup instance of the complex application, the backup process-pair manager and the backup instance of the complex application configured to replace the primary process-pair manager and the primary instance of the complex application in the event of failure of the primary computer system or failure of the primary instance of the complex application;
wherein the primary process-pair manager and the backup process-pair manager each include an application state model, and each application state model comprises;
two or more states, with one state being designated as a current state, with states grouped in main states;
one or more transitions, each transition interconnecting two states, each transition defining the conditions under which a process-pair manager will change the current state to a state interconnected with the current state; and
one or more actions, each action associated with a respective transition, each action being a sequence of steps executed by a process-pair manager when traversing the transition associated with the action.
3 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for providing process-pair protection to complex applications is provided. The apparatus of the present invention includes a process-pair manager or PPM. The PPM is replicated so that a respective PPM is deployed on each of two computer systems. Each computer system also hosts a watchdog process that monitors and restarts the PPM in case of PPM failures. Each PPM communicates with a respective instance of an application. The application instances may include one or more processes along with associated resources. During normal operation the primary application provides service and periodically checkpoints its state to the backup application. The backup application functions in a standby mode. The two PPMs communicate with each other and exchange messages as state changes occur. The apparatus also includes in each computer system a node watcher that is the PPM of failures of the remote computer system. This way, each monitor the state of the other application instance and the health of the computer system on which it is resident. If a failure of the primary application or of the computer system where it runs is detected, the PPM managing the backup application takes steps to cause its instance of the application to become primary. The failover operation is faster (between 5 and 20 seconds) than corresponding operations provided by other existing methods (between one and 40 minutes depending on the application initialization time) because the backup application does not need to be started and initialized to become primary. The failover is stateful because the backup application receives periodic updates of the state of the primary application.
89 Citations
20 Claims
-
1. An apparatus for providing continuous availability to complex applications through the use of process-pair protection to allow fast and stateful application failover, the apparatus comprising:
-
a primary process-pair manager located on a primary computer system, the primary process-pair manager configured to startup and manage a primary instance of a complex application;
a backup process-pair manager located on a backup computer system, the backup process-pair manager configured to startup and manage a backup instance of the complex application, the backup process-pair manager and the backup instance of the complex application configured to replace the primary process-pair manager and the primary instance of the complex application in the event of failure of the primary computer system or failure of the primary instance of the complex application;
wherein the primary process-pair manager and the backup process-pair manager each include an application state model, and each application state model comprises;
two or more states, with one state being designated as a current state, with states grouped in main states;
one or more transitions, each transition interconnecting two states, each transition defining the conditions under which a process-pair manager will change the current state to a state interconnected with the current state; and
one or more actions, each action associated with a respective transition, each action being a sequence of steps executed by a process-pair manager when traversing the transition associated with the action. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11)
a watchdog process running on the primary computer system to monitor the primary process-pair manager and restart the primary process-pair manager in case of failure.
-
-
4. An apparatus as recited in claim 1 which further comprises:
a watchdog process running on the backup computer system to monitor the backup process-pair manager and restart the backup process-pair manager in case of failure.
-
5. An apparatus as recited in claim 1 which further comprises:
a node watcher running on the primary computer system, the node watcher configured to exchange a heartbeat signal with the backup computer system to detect failure of the backup computer system.
-
6. An apparatus as recited in claim 1 which further comprises:
a node watcher running on the backup computer system, the node watcher configured to exchange a heartbeat signal with the primary computer system to detect failure of the primary computer system.
-
7. An apparatus as recited in claim 1 wherein the primary instance of the complex application is programmed to periodically perform a checkpointing operation by sending internal state information to the backup instance of the complex application.
-
8. An apparatus as recited in claim 1 wherein the primary process-pair manager includes an application administration module configured to provide a single interface between the primary process-pair manager and components of its respective complex application.
-
9. An apparatus as recited in claim 1 wherein the backup process-pair manager includes an application administration module configured to provide a single interface between the backup process-pair manager and components of its respective complex application.
-
10. An apparatus as recited in claim 1 wherein the primary process-pair manager includes an interapplication communication module configured to facilitate communication between the process-pair managers.
-
11. An apparatus as recited in claim 1 wherein the backup process-pair manager includes an interapplication communication module configured to facilitate communication between the process-pair managers.
-
2. A computer program product comprising a computer usable medium having computer readable code embodied therein for providing high availability to a complex application through the use of process-pair protection to allow fast and stateful application failover, the computer program product comprising:
-
first computer readable program code devices configured to cause a primary computer system to provide a primary process-pair manager to start and manage a primary instance of a complex application;
second computer readable program code devices configured to cause a backup computer system to provide a backup process pair manager to startup and manage a backup instance of the complex application, the backup process-pair manager and the backup instance of the complex application configured to replace the primary process-pair manager and the primary instance of the complex application in the event of failure of the primary computer system or failure of the primary instance of the complex application;
wherein the primary process-pair manager and the backup process-pair manager each include an application state model, and each application state model comprises;
two or more states, with one state being designated as a current state, with states grouped in main states;
one or more transitions, each transition interconnecting two states, each transition defining the conditions under which a process-pair manager will change the current state to a state interconnected with the current state; and
one or more actions, each action associated with a respective transition, each action being a sequence of steps executed by a process-pair manager when traversing the transition associated with the action. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
computer readable program code devices configured to cause the primary computer system to provide a watchdog process to monitor the primary process-pair manager and restart the primary process-pair manager in case of failure.
-
-
13. A computer program product as recited in claim 2 which further comprises:
computer readable program code devices configured to cause the backup computer system to provide a watchdog process to monitor the backup process-pair manager and restart the backup process-pair manager in case of failure.
-
14. A computer program product as recited in claim 2 which further comprises:
computer readable program code devices configured to cause the primary computer system to provide a node watcher, the node watcher to configured to exchange a heartbeat signal with the backup computer system to detect failure of the backup computer system.
-
15. A computer program product as recited in claim 2 which further comprises:
computer readable program code devices configured to cause the backup computer system to provide a node watcher, the node watcher configured to exchange a heartbeat signal with the primary computer system to detect failure of the primary computer system.
-
16. A computer program product as recited in claim 2 wherein the primary instance of the complex application is programmed to periodically perform a checkpointing operation by sending internal state information to the backup instance of the complex application.
-
17. A computer program product as recited in claim 2 wherein the primary process-pair manager includes an application administration module configured to provide a single interface between the primary process-pair manager and components of its respective complex application.
-
18. A computer program product as recited in claim 2 wherein the backup process-pair manager includes an application administration module configured to provide a single interface between the backup process-pair manager and components of its respective complex application.
-
19. A computer program product as recited in claim 2 wherein at least one of the process-pair managers includes an interapplication communication module configured to facilitate communication between the process-pair managers.
-
20. A method for providing high availability to complex applications through the use of process-pair protection to allow fast and stateful application failover, the method comprising:
-
using a primary process-pair manager to start and manage a primary instance of a complex application on a primary computer system;
using a backup process-pair manager to start and manage a backup instance of a complex application on a backup computer system, wherein the backup process-pair manager and the backup instance of the complex application are configured to replace the primary process-pair manager and the primary instance of the complex application in the event of failure of the primary computer system or failure of the primary instance of the complex application;
wherein the primary process-pair manager and the backup process-pair manager each include an application state model, and each application state model comprises;
two or more states, with one state being designated as a current state, with states grouped in main states;
one or more transitions, each transition interconnecting two states, each transition defining the conditions under which a process-pair manager will change the current state to a state interconnected with the current state; and
one or more actions, each action associated with a respective transition, each action being a sequence of steps executed by a process-pair manager when traversing the transition associated with the action.
-
Specification