Filesystem failover in a single system image environment
First Claim
1. A method for preserving open-unlinked files during node failure within a computer cluster, the method comprising the steps, performed by a server node, ofreceiving a request to unlink a file;
- determining if the file is open on any of the nodes within the computer cluster;
establishing a durable link to the file if the file is open on any node within the computer cluster;
informing each node where the file is open that the file is subject to delayed unlinking; and
unlinking the file.
5 Assignments
0 Petitions
Accused Products
Abstract
A method and apparatus for transparent failover of a filesystem within a computer cluster is provided. For failover protection, a filesystem is physically connected to an active server node and a standby server node. A cluster file system provides distributed access to the filesystem throughout the computer cluster. The cluster file system monitors the progress of each operation performed on the failover protected filesystem. If the active server node should fail during an operation, all processes performing operations on the failover protected filesystem are caused to sleep. The filesystem is then relocated to the standby server node. The cluster file system then awakens each sleeping process and retries each pending operation.
-
Citations
8 Claims
-
1. A method for preserving open-unlinked files during node failure within a computer cluster, the method comprising the steps, performed by a server node, of
receiving a request to unlink a file; -
determining if the file is open on any of the nodes within the computer cluster;
establishing a durable link to the file if the file is open on any node within the computer cluster;
informing each node where the file is open that the file is subject to delayed unlinking; and
unlinking the file. - View Dependent Claims (2, 3)
receiving a request to close a file;
determining if the request indicates that the file is subject to delayed unlinking;
determining if the file is open on any of the nodes within the computer cluster;
removing the durable link to the file if the file is open and delayed unlinking has been requested; and
closing the file.
-
-
3. The method of claim 1, wherein the step of determining if the file is open on any of the nodes within the computer cluster comprises the substeps of:
-
requesting that all nodes within the computer cluster unlink the file; and
determining which nodes did not unlink the file.
-
-
4. A method for preserving non-idempotent operations during node failure within a computer cluster, the method comprising the steps, performed by a client node, of:
-
registering an operation to be performed;
sending the operation to a server node;
receiving a predicted result of performing the operation from the server node;
recording the predicted result of performing the operation from the server node;
receiving an actual result of performing the operation from the server node;
replacing the predicted result of performing the operation from the server node with the actual result of performing the operation from the server node; and
sending a completion message to the server node.
-
-
5. A method for preserving non-idempotent operations during node failure within a computer cluster, the method comprising the steps, performed by the server system, of:
-
receiving an operation from a client node;
locking each of the resources required to perform the operation;
evaluating the predicted result of performing the operation;
sending the predicted result to the client node;
performing the operation;
sending the actual result of performing the operation to the client node;
receiving a completion message from a client node; and
unlocking the resources required to perform the operation. - View Dependent Claims (7, 8)
rebuilding token state within the server instance;
rebuilding record locks within the server instance; and
rebuilding the state of open-linked files within the server instance.
-
-
6. A method for transparent failover of a filesystem within a computer cluster, the method comprising the steps, performed by a standby server node, of:
-
checking the integrity of a filesystem made unavailable by the failure of an active server node on which the filesystem is resident;
making the filesystem available within the standby server node by mounting the filesystem within the standby server node and creating a server instance associated within the filesystem;
completing operations interrupted by the failure of the active server node; and
having the standby server node reassociate the server instance with each client instance within the computer cluster.
-
Specification