Multiple Node/Virtual Input/Output (I/O) Server (VIOS) Failure Recovery in Clustered Partition Mobility
First Claim
1. In a data processing system having a processor, a memory coupled to the processor, at least one input/output (I/O) adapter that enables connection to an external network with a shared storage repository, and a plurality of virtual I/O servers (VIOSes) that form a VIOS cluster with a shared database, where each VIOS is cluster aware, a method comprising:
- activating a first monitoring thread on a first VIOS of a first server to track a status of a live partition mobility (LPM) event;
recording information about the LPM event within the shared database by using said first monitoring thread;
in response to the first VIOS sustaining a failure condition, identifying one or more functioning monitoring threads that continue to function on a source server, wherein the failure condition results in a loss of LPM event monitoring by the first monitoring thread;
determining whether said one or more functioning monitoring threads is a single, last monitoring thread; and
in response to a first VIOS on the first server being in failed state, performing, via said last monitoring thread, cleanup and update operations within the shared database, wherein the cleanup and update are performed responsive to receipt of an indication that there are one or more nodes on the first server that are in the failed state.
0 Assignments
0 Petitions
Accused Products
Abstract
A method utilizes cluster-awareness to effectively support a live partition mobility (LPM) event and provide recovery from node failure within a Virtual Input/Output (I/O) Server (VIOS) cluster. An LPM utility creates a monitoring thread on a first VIOS on initiation of a corresponding LPM event. The monitoring thread tracks a status of an LPM and records status information in the mobility table of a database. The LPM utility creates other monitoring threads on other VIOSes running on the (same) source server. If the first VIOS VIOS sustains one of multiple failures, the LPM utility provides notification to other functioning nodes/VIOSes. The LPM utility enables a functioning monitoring thread to update the LPM status. In particular, a last monitoring thread may perform cleanup/update operations within the database based on an indication that there are nodes on the first server that are in failed state.
-
Citations
6 Claims
-
1. In a data processing system having a processor, a memory coupled to the processor, at least one input/output (I/O) adapter that enables connection to an external network with a shared storage repository, and a plurality of virtual I/O servers (VIOSes) that form a VIOS cluster with a shared database, where each VIOS is cluster aware, a method comprising:
-
activating a first monitoring thread on a first VIOS of a first server to track a status of a live partition mobility (LPM) event; recording information about the LPM event within the shared database by using said first monitoring thread; in response to the first VIOS sustaining a failure condition, identifying one or more functioning monitoring threads that continue to function on a source server, wherein the failure condition results in a loss of LPM event monitoring by the first monitoring thread; determining whether said one or more functioning monitoring threads is a single, last monitoring thread; and in response to a first VIOS on the first server being in failed state, performing, via said last monitoring thread, cleanup and update operations within the shared database, wherein the cleanup and update are performed responsive to receipt of an indication that there are one or more nodes on the first server that are in the failed state. - View Dependent Claims (2, 3, 4, 5, 6)
-
Specification