Apparatus and accompanying method for use in a sysplex environment for performing escalated isolation of a sysplex component in the event of a failure
First Claim
1. In a computer having a resource shared by a plurality of processes, said processes executing on a plurality of different systems, a method for-isolating a failed component in said computer from said resource comprising the steps of:
- in a first one of said plurality of different systems that executes a process in a fence group, wherein the fence group is defined as all processes, in said plurality of processes, that collectively and exclusively utilize said resource wherein each of said processes in the fence group executes on a corresponding one of said systems and is a different member of the group, the steps of;
issuing, whenever any one member in the group fails to properly execute on a second one of said systems, a fence request against the failed one member;
in said second one of the systems;
attempting, in response to said fence request, to impose a fence around said failed one member to completely isolate, through software, said member from utilizing the resource; and
if, as a result of said attempting step, the fence could not be imposed against the failed one member, escalating said fence, as specified in accordance with pre-defined escalation rules, to completely isolate either all members of said fence group which execute on said second system or said second system in its entirety from utilizing said resource.
1 Assignment
0 Petitions
Accused Products
Abstract
Apparatus and accompanying methods for use in preferably a multi-system shared data (sysplex (5)) environment which quickly and efficiently isolates (fences), through a pre-defined hierarchical order, failed sysplex components from accessing shared data in order to protect data integrity. Specifically, by dividing a sysplex workload into specified fence groups (FG A, FG B) and providing appropriate software and hardware fence support, fencing can occur at various distinct levels: a member-to-member level, i.e. to allow any member (220, 225, 230, 233,237) of a fence group to fully isolate any other ("target") member of that same group; a fence group level, i.e. to isolate all members of a fence group that execute on a "target" system (2001, 2002, 2003); and a system level, i.e. to fully isolate an entire "target" system. Through pre-defined escalation rules (630), fencing can be escalated from a lower member level to a higher, group or system, level in the event a lower level fence can not be successfully imposed. Member level fencing is accomplished in software (1300, 1500); group and system level fencing is accomplished through dedicated hardware fencing facilities (44). An identifier (444, 464) uniquely designates each different fence group existing on a computer processing complex (CPC) (401, 402, 403, 40M) in the sysplex over the life of that CPC. Advantageously, this technique eliminates erroneous back level fencing, significantly expedites fence processing and also greatly reduces a need for human intervention.
37 Citations
28 Claims
-
1. In a computer having a resource shared by a plurality of processes, said processes executing on a plurality of different systems, a method for-isolating a failed component in said computer from said resource comprising the steps of:
-
in a first one of said plurality of different systems that executes a process in a fence group, wherein the fence group is defined as all processes, in said plurality of processes, that collectively and exclusively utilize said resource wherein each of said processes in the fence group executes on a corresponding one of said systems and is a different member of the group, the steps of; issuing, whenever any one member in the group fails to properly execute on a second one of said systems, a fence request against the failed one member; in said second one of the systems; attempting, in response to said fence request, to impose a fence around said failed one member to completely isolate, through software, said member from utilizing the resource; and if, as a result of said attempting step, the fence could not be imposed against the failed one member, escalating said fence, as specified in accordance with pre-defined escalation rules, to completely isolate either all members of said fence group which execute on said second system or said second system in its entirety from utilizing said resource. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. In a computer having a resource shared by a plurality of processes, said processes executing on a plurality of different systems, apparatus for isolating a failed component in said computer from said resource comprising:
-
in a first one of said plurality of different systems that executes a process in a fence group, wherein the fence group is defined as all processes, in said plurality of processes, that collectively and exclusively utilize said resource wherein each of said processes in the fence group executes on a corresponding one of said systems and is a different member of the group; means for issuing, whenever any one member in the group fails to properly execute on a second one of said systems, a fence request against tho failed one member; in said second one of the systems; means for attempting, in response to said fence request to impose a fence around said failed one member to completely isolate, through software, said member from utilizing the resource; and means for escalating the fence, as specified in accordance with pre-defined escalation rules, and if said attempting means could not impose the fence against the failed one member, to completely isolate either all members of said fence group which execute on said second system or said second system in its entirety from utilizing said resource. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
Specification