Enhancing reliability and robustness of a cluster
First Claim
1. A host in a cluster comprising a cluster fabric and one or more I/O controllers attached to the cluster fabric, the host comprising:
- an operating system (OS); and
a fabric control driver included in the operating system (OS) and configured to;
to create, when each cluster adapter is installed to access one or more fabric-attached I/O controllers, a virtual fabric control device object representing connectivity to the cluster fabric, and one or more I/O controller device objects for the corresponding fabric-attached I/O controllers that constitute “
child”
device objects of the virtual fabric control device object;
to destroy affected I/O controller device objects, while maintaining the virtual fabric control device object, when one cluster adapter malfunctions but another cluster adapter remains available; and
alternatively, to destroy the affected I/O control device objects and the virtual fabric control device object, when all connectivity to the cluster fabric is lost.
1 Assignment
0 Petitions
Accused Products
Abstract
Reliability and robustness of a cluster having a host connected thereto via a cluster interconnection fabric may be enhanced by determining if an error condition exists in an I/O controller connected to the host via the cluster interconnection fabric by attempting to communicate with it a first predetermined time period after an inquiry by an operating system as to whether or not an I/O controller driver stack should be unloaded and commanding the operating system to unload the I/O controller driver stack upon a determination that the error condition still exists. The determination as to whether the error condition still exists may be repeated a predetermined number of times prior to commanding the unloading of the I/O controller driver stack upon a determination that the error condition still exists. Furthermore, a determination may be made to determine if the error condition still exists in the I/O controller by attempting to communicate with it a predetermined period of time after the I/O controller driver stack has been unloaded by the operating system in response to the command to unload and commanding the operating system to reload the I/O controller driver stack upon a determination that the error condition no longer exists. The above-noted additional determination may also be repeated a predetermined number of times upon the determination that the error condition still exists.
-
Citations
6 Claims
-
1. A host in a cluster comprising a cluster fabric and one or more I/O controllers attached to the cluster fabric, the host comprising:
-
an operating system (OS); and
a fabric control driver included in the operating system (OS) and configured to;
to create, when each cluster adapter is installed to access one or more fabric-attached I/O controllers, a virtual fabric control device object representing connectivity to the cluster fabric, and one or more I/O controller device objects for the corresponding fabric-attached I/O controllers that constitute “
child”
device objects of the virtual fabric control device object;
to destroy affected I/O controller device objects, while maintaining the virtual fabric control device object, when one cluster adapter malfunctions but another cluster adapter remains available; and
alternatively, to destroy the affected I/O control device objects and the virtual fabric control device object, when all connectivity to the cluster fabric is lost. - View Dependent Claims (2, 3, 4)
-
-
5. A method of operating a host in a cluster comprising a cluster fabric and one or more I/O controllers attached to the cluster fabric to eliminate a cluster adapter as a single point of failure in the cluster host, the method comprising:
-
creating, when each cluster adapter is installed to access one or more fabric-attached I/O controllers, a virtual fabric control device object representing connectivity to the cluster fabric, and one or more I/O controller device objects for the corresponding fabric-attached I/O controllers that constitute “
child”
device objects of the virtual fabric control device object;
destroying affected I/O controller device objects, while maintaining the virtual fabric control device object, when one cluster adapter malfunctions but another cluster adapter remains available; and
alternatively, destroying the affected I/O control device objects and the virtual fabric control device object, when all connectivity to the cluster fabric is lost. - View Dependent Claims (6)
-
Specification