METHOD AND SYSTEM FOR PROVIDING HIGH AVAILABILITY TO DISTRIBUTED COMPUTER APPLICATIONS
First Claim
1. A method of achieving transparent integration of a distributed application program with a high availability protection program, comprising:
- injecting registration code, transparently and automatically, into all sub-programs during launch, without the need of modifying or recompiling the application program and without the need of a custom loader;
registering the distributed application automatically with a high-availability protection program;
detecting a failure in the execution of the distributed application program by said high-availability protection program; and
executing the distributed application, subject to the detected failure, with one or more sub-programs being executed from their respective backup nodes automatically by said high-availability protection program in response to the failure.
4 Assignments
0 Petitions
Accused Products
Abstract
Method, system, apparatus and/or computer program for achieving transparent integration of high-availability services for distributed application programs. Loss-less migration of sub-programs from their respective primary nodes to backup nodes is performed transparently to a client which is connected to the primary node. Migration is performed by high-availability services which are configured for injecting registration codes, registering distributed applications, detecting execution failures, executing from backup nodes in response to failure, and other services. High-availability application services can be utilized by distributed applications having any desired number of sub-programs without the need of modifying or recompiling the application program and without the need of a custom loader. In one example embodiment, a transport driver is responsible for receiving messages, halting and flushing of messages, and for issuing messages directing sub-programs to continue after checkpointing.
-
Citations
42 Claims
-
1. A method of achieving transparent integration of a distributed application program with a high availability protection program, comprising:
-
injecting registration code, transparently and automatically, into all sub-programs during launch, without the need of modifying or recompiling the application program and without the need of a custom loader;
registering the distributed application automatically with a high-availability protection program;
detecting a failure in the execution of the distributed application program by said high-availability protection program; and
executing the distributed application, subject to the detected failure, with one or more sub-programs being executed from their respective backup nodes automatically by said high-availability protection program in response to the failure. - View Dependent Claims (2, 3, 4)
-
-
5. A method of performing loss-less migration of a distributed application, comprising:
-
migrating one or more sub-programs within an application, without loss, from their respective primary nodes to at least one backup node;
maintaining transparency to a client connected to the primary node over a transport connection;
flushing and halting said transport connection during the taking of checkpoints; and
restoring said one or more sub-programs from said checkpoints in response to initiating recovery of the application. - View Dependent Claims (6, 7, 8, 9, 10)
-
-
11. A method of fault protection for applications distributed across multiple computer nodes, comprising:
-
providing high-availability application services for transparently loading applications, registering applications for protection, detecting faults in applications, and initiating recovery of applications;
taking checkpoints, by said high-availability application services, of one or more sub-programs within applications executing across multiple computer nodes;
restoring said one or more sub-programs from said checkpoints in response to initiating recovery of one or more said applications by said high-availability application services;
wherein said high-availability application services are provided to said one or more sub-programs running on a primary node, while at least one backup node stands ready in the event of a fault and subsequent recovery; and
coordinating execution of individual sub-programs within a coordinator program which is executed on a node accessible to the multiple computer nodes. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
-
32. A computer executable program for loss-less migration of a distributed application program, comprising:
-
a high-availability services module configured for execution in conjunction with an operating system upon which at least one application can be executed on one or more computer nodes of a distributed system; and
programming within said high-availability services module executable on said computer nodes for loss-less migration of sub-programs within said at least one application for, checkpointing of all states in the transport connection, coordinating checkpointing of the state of the transport connection across the distributed system, restoring all states in the transport connection to the state they were in at the last checkpoint, coordinating recovery within a restore procedure that is coupled to the transport connection. - View Dependent Claims (33, 34, 35, 36)
-
-
37. A system of multiple computer nodes over which distributed applications are protected against faults, comprising:
-
a plurality of computer nodes upon which applications can be executed;
an operating system configured for execution on each said computer node and upon which said applications are executed;
a high-availability services module configured for protecting said applications from faults, and for executing in combination with said operating system; and
programming within said high-availability services module configured for execution on each said computer node for, providing transparent application functions for loading applications, registering applications for protection, detecting faults in applications, and initiating recovery of applications, checkpointing of one or more sub-programs to create checkpoints for the application executing on at least one said computer node, restoring said one or more sub-programs from said checkpoints during said initiating of recovery of the application, executing said one or more sub-programs on a primary node while at least one backup node stands ready for executing the sub-programs in the event of a fault and subsequent recovery, and coordinating execution of individual sub-programs within a coordinator program which runs on a node accessible to said plurality of computer nodes. - View Dependent Claims (38, 39, 40, 41, 42)
-
Specification