Failure recovery for process relationships in a single system image environment
First Claim
1. A method for recovery of process relationships after failure of a node within a computer cluster, the method comprising the steps of:
- selecting, by a cluster management service process, a surrogate origin node;
generating, by a slave daemon, a list of care relationships, each care relationship involving a process that originated at the failed node;
sending by the slave daemon, the list of care relationships to the surrogate origin node;
receiving by the surrogate origin node, the list of care relationships; and
reconstructing, by the surrogate origin node, a complete set of care relationships for processes that originated at the failed node.
5 Assignments
0 Petitions
Accused Products
Abstract
A system for recovery of process relationships following node failure within a computer cluster is provided. For relationship recovery, each node maintains set of care relationships. Each relationship is of the form carer cares about care target. Care relationships describe process relations such as parent-child or group leader-group member. Care relationships are stored at the origin node of their care targets. Following node failure, a surrogate origin node is selected. The surviving nodes then cooperate to rebuild vproc structures and care relationships for the processes that originated at the failed node at the surrogate origin node. The surviving nodes then determine which of their own care targets were terminated by the node failure. For each terminated care targets, notifications are sent to the appropriate carers. This allows surviving processes to correctly recover from severed process relationships.
-
Citations
15 Claims
-
1. A method for recovery of process relationships after failure of a node within a computer cluster, the method comprising the steps of:
-
selecting, by a cluster management service process, a surrogate origin node; generating, by a slave daemon, a list of care relationships, each care relationship involving a process that originated at the failed node; sending by the slave daemon, the list of care relationships to the surrogate origin node; receiving by the surrogate origin node, the list of care relationships; and reconstructing, by the surrogate origin node, a complete set of care relationships for processes that originated at the failed node. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A system for recovery of process relationships after failure of a node within a computer cluster, the system comprising the steps of:
-
a cluster management service process configured to select a surrogate origin node to substitute for the failed node; a slave daemon configured to generate a list of care relationships, each care relationship involving a process that originated at the failed node, the slave daemon also configured to send the list of care relationships to the surrogate origin node; and a rebuilding process configure to receive the list of care relationships and to reconstruct a complete set of care relationships for processes that originated at the failed node. - View Dependent Claims (7, 8, 9, 10)
-
-
11. A computer program product comprising:
a computer usable medium having computer readable code embodied therein for recovery of process relationships after failure of a node within a computer cluster, the computer program product comprising; first computer readable program code devices configured to cause a computer system to select a surrogate origin node to substitute for the failed node; second computer readable program code devices configured to cause a computer system to generate a list of care relationships, each care relationship involving a process that originated at the failed node; third computer readable program code devices configured to cause a computer system to send the list of care relationships to the surrogate origin node; fourth computer readable program code devices configured to cause a computer system to receive the list of care relationships; and fifth computer readable program code devices configured to cause a computer system to reconstruct a complete set of care relationships for processes that originated at the failed node. - View Dependent Claims (12, 13, 14, 15)
Specification