High availability platform with fast recovery from failure by reducing non-response time-outs
First Claim
1. A method for providing high availability of a computer platform, comprising the steps of:
- running at least one monitored process indicating its live state by periodically sending a heart-beat message and indicating a heart-beat frequency by inserting time information in the heart-beat message;
running a fault-tolerant controller process (FTC) receiving the heart-beat message and reacting to said time information by modifying the frequency at which it expects the heart-beat message; and
running an additional process and wherein the monitored process regularly sends the additional process a message and notifies the FTC that the additional process is dead when it receives an error code from an operating system after sending a message to the additional process.
2 Assignments
0 Petitions
Accused Products
Abstract
A high availability platform runs a fault-tolerant controller process (FTC) and at least one monitored process that indicates its live state by periodically sending a heart-beat message to the FTC. The FTC responds to the heart-beat message by modifying the frequency at which it expects the heart-beat message according to information contained therein.
The platform may run an additional process, the monitored process being adapted to regularly send the additional process a message and to notify the FTC that the additional process is dead when it receives an error code from an operating system after sending a message to the additional process.
-
Citations
8 Claims
-
1. A method for providing high availability of a computer platform, comprising the steps of:
-
running at least one monitored process indicating its live state by periodically sending a heart-beat message and indicating a heart-beat frequency by inserting time information in the heart-beat message;
running a fault-tolerant controller process (FTC) receiving the heart-beat message and reacting to said time information by modifying the frequency at which it expects the heart-beat message; and
running an additional process and wherein the monitored process regularly sends the additional process a message and notifies the FTC that the additional process is dead when it receives an error code from an operating system after sending a message to the additional process. - View Dependent Claims (2, 3, 4, 5, 6, 7)
running, on each system, an FTC and associated monitored processes, wherein a plurality of processes running on at least a first system are in standby mode and correspond to respective active processes on the second system, such that, if the second system fails, the standby processes of the first system become active and take over the tasks of the processes that were active on the second system; and
when it is necessary to force shut down of one system, sending the processes a switch-over signal, causing said active processes to die and the respective standby processes on the other system to become active through a transition phase in which the processes do not perform input/output operations.
-
-
7. A high availability platform arranged, in operation, to carry out the method of claim 1.
-
8. A method for providing high availability of a computer platform, comprising the steps of:
-
running at least one monitored process indicating its live state by periodically sending a heart-beat message and indicating a heart-beat frequency by inserting time information in the heart-beat message;
running a fault-tolerant controller process (FTC) receiving the heart-beat message and reacting to said time information by modifying the frequency at which it expects the heart-beat message;
running, on each system, an FTC and associated monitored processes, wherein a plurality of processes running on at least a first system are in standby mode and correspond to respective active processes on the second system, such that, if the second system fails, the standby processes of the first system become active and take over the tasks of the processes that were active on the second system; and
when it is necessary to force shut down of one system, sending the processes a switch-over signal, causing said active processes to die and the respective standby processes on the other system to become active through a transition phase in which the processes do not perform input/output operations.
-
Specification