Generic fault tolerant platform
First Claim
1. A method for preventing split brain syndrome in a fault tolerant platform having at least one process in an active state of execution and at least one process in a standby state, switchover capabilities for promoting a process in the standby state to an active state, and commonly available resources dedicated to the processes in the active and standby states, said method comprising the steps of:
- limiting simultaneous access to the commonly available resources to a maximum number of processes in the active state;
giving priority of access to the commonly available resources to processes last promoted to the active state; and
terminating an active state of a process that is in an active state of healthy execution when said process in an active state of healthy execution is not allowed access to the commonly available resources.
4 Assignments
0 Petitions
Accused Products
Abstract
A fault tolerant platform is provided that comprises two systems running pairs of processes in the active and standby state, one process from each pair running on each system. Each system comprises a fault-tolerance controlling process, first communication channels provided between the fault-tolerance controlling process and the processes in the active or standby state running on its system, and second communication channels provided between the fault-tolerance controlling processes. Management of fault tolerance (that is, promoting a process in a standby state to the active state, and making a process in an active state exit from the active state) is handled by the fault-tolerance controlling processes. A generic management of fault tolerant processes is thus provided in which fault detection and switchover is carried out independently of the applications. The invention thus ensures efficient and coherent switchover between active and standby processes.
-
Citations
22 Claims
-
1. A method for preventing split brain syndrome in a fault tolerant platform having at least one process in an active state of execution and at least one process in a standby state, switchover capabilities for promoting a process in the standby state to an active state, and commonly available resources dedicated to the processes in the active and standby states, said method comprising the steps of:
-
limiting simultaneous access to the commonly available resources to a maximum number of processes in the active state; giving priority of access to the commonly available resources to processes last promoted to the active state; and terminating an active state of a process that is in an active state of healthy execution when said process in an active state of healthy execution is not allowed access to the commonly available resources. - View Dependent Claims (2, 3)
-
-
4. A fault tolerant platform comprising:
-
at least two systems; multiple processes in an active state running on said systems; each process in an active state having at least one corresponding replicate process in a standby state concurrently running on a different one of said systems to the system running said process in an active state; fault-tolerance manager means for monitoring said processes and responsive to an apparent fault in one said process for individually promoting a corresponding replicate process in a standby state to the active state; and first communication channels between said processes in an active state and said fault-tolerance manager means and between said processes in a standby state and said fault-tolerance manager means; the fault-tolerance manager means comprising a respective fault-tolerance controlling process running on each system, and second communication channels between said fault tolerance controlling processes for exchanging health monitoring and status messages, said first communication channels comprising a communication channel between a process in an active or standby state running on a system and the fault-tolerance controlling process running on a same said system. - View Dependent Claims (5, 6, 7, 8, 9, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
10. An apparatus for managing fault tolerance in a platform comprising:
-
at least two systems; at least a first process in an active state running on one system; at least one corresponding replicate process in a standby state running on another system; fault-tolerance controlling means for monitoring said processes and responsive to an apparent fault in one said process for individually promoting a corresponding replicate process in said standby state to the active state; first communication channels comprising a communication channel between a process in an active or standby state running on a system and a fault-tolerance controlling process running on a same said system; and wherein the fault-tolerance controlling means comprises a respective fault-tolerance controlling process running on each system and second communication channels between fault-tolerance controlling processes, and wherein the fault-tolerance controlling process running on each said system is responsible for promoting processes running on that system to the active state, and for making processes running on that system exit from the active state. - View Dependent Claims (19, 20)
-
-
21. A fault tolerant platform having at least one process in an active state of execution and at least one process in a standby state, switchover capabilities for promoting a process in the standby state to an active state, and commonly available resources dedicated to the processes in the active and standby states, and means for preventing split brain syndrome, said means comprising:
-
a) means for limiting simultaneous access to the commonly available resources to a maximum number of processes in the active state; b) means for giving priority of access to the commonly available resources to processes last promoted to the active state; and c) means for terminating an active state of a process that is in an active state of healthy execution when said process in an active state of healthy execution is not allowed access to the commonly available resources. - View Dependent Claims (22)
-
Specification