Coordinating persistent status information with multiple file servers
First Claim
1. A memory, including a set of instructions, wherein said set of instructions are executable by a processor to operate a file server, said set of instructions comprising:
- controlling a subset of a set of shared storage devices;
receiving and transmitting messages with a second file server, said steps for receiving and transmitting using a communication path including said shared storage devices;
monitoring said communication path and said shared storage devices;
storing state information about said file server in a persistent memory; and
performing a takeover operation of said shared storage device in response to said instructions for monitoring and a state of said persistent memory, wherein said instructions for receiving and transmitting prevent both of said first server and said second server from concurrently performing said takeover operation.
2 Assignments
0 Petitions
Accused Products
Abstract
The invention provides a storage system, and a method for operating a storage system, that provides for relatively rapid and reliable takeover among a plurality of independent file servers. Each file server maintains a reliable communication path to the others. Each file server maintains its own state in reliable memory. Each file server regularly confirms the state of the other file servers. Each file server labels messages on the redundant communication paths, so as to allow other file servers to combine the redundant communication paths into a single ordered stream of messages. Each file server maintains its own state in its persistent memory and compares that state with the ordered stream of messages, so as to determine whether other file servers have progressed beyond the file server'"'"'s own last known state. Each file server uses the shared resources (such as magnetic disks) themselves as part of the redundant communication paths, so as to prevent mutual attempts at takeover of resources when each file server believes the other to have failed. Each file server provides a status report to the others when recovering from an error, so as to prevent the possibility of multiple file servers each repeatedly failing and attempting to seize the resources of the others.
101 Citations
12 Claims
-
1. A memory, including a set of instructions, wherein said set of instructions are executable by a processor to operate a file server, said set of instructions comprising:
-
controlling a subset of a set of shared storage devices;
receiving and transmitting messages with a second file server, said steps for receiving and transmitting using a communication path including said shared storage devices;
monitoring said communication path and said shared storage devices;
storing state information about said file server in a persistent memory; and
performing a takeover operation of said shared storage device in response to said instructions for monitoring and a state of said persistent memory, wherein said instructions for receiving and transmitting prevent both of said first server and said second server from concurrently performing said takeover operation. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
managing at a first server at least part of a shared resource; receiving and transmitting a sequence of messages between said first server to a second server, using said shared resource; and
performing a takeover operation at a first server of at least part of said shared resource in response to said sequence of messages;
whereby said instructions for receiving and transmitting prevent both of said first server and said second server from concurrently performing said takeover operation.
-
-
3. A memory as in claim 1, including instructions for
determining, at said first server, a state for itself and for said second server in response to said communication path; -
determining, at said second server, a state for itself and for said first server in response to said communication path;
whereby said first server and said second server concurrently each determine state for each other, such that it does not occur that each of said first server and said second server both consider the other to be inoperative.
-
-
4. A memory as in claim 1, also including an instruction for storing state information about said first server in a persistence memory, wherein said first server determines a state for itself in response to a state of said persistent memory.
-
5. A memory as in claim 1, including instructions for
transmitting, from said first server, recovery information relating to a status of said first server on recovery from a service interruption; - and
performing a giveback operation of at least part of said shared resource is responsive to said recovery information.
- and
-
6. A memory, as in claim 1, including instructions for
transmitting, from said first server, recovery information relating to a status of said server on recovery from a service interruption; wherein said instructions for performing said takeover operation are responsive to said recovery information.
-
7. A memory as in claim 1, wherein
said shared storage device includes a plurality of storage devices; - and
said communication path includes at least part of said storage devices;
whereby loss of access to said part of said storage devices breaks said communication path.
- and
-
8. A memory as in claim 1, including instructions for
transmitting at least one message from a first said server to a second said server, said message indicating that said first server is attempting said takeover; -
altering a state of said second server in response to said message; and
in said altered state refraining from writing to said shared resource.
-
-
9. A memory as in claim 1, wherein said communication path includes a plurality of independent communication paths between said pair;
- and including instructions for
numbering said sequence of messages;
determining, at each recipient, a unified order for messages delivered using different ones of said plurality of independent communication paths; and
determining, at said first server, a state for itself and for said second server in response to a state of said shared resource and in response to a state of a persistent memory at said first server.
- and including instructions for
-
10. A memory as in claim 1, wherein said communication path includes a plurality of independent communication paths between said pair;
- and including instructions for
numbering said sequence of messages;
determining, at each recipient, a unified order for messages delivered using different ones of said plurality of independent communication paths;
transmitting substantially each message in said sequence on at least two of said plurality of independent communication paths, whereby there is no single point of failure for communication between said pair.
- and including instructions for
-
11. A method as in claim 1, wherein said communication path includes a plurality of independent communication paths between said pair;
- and including instructions for
numbering said sequence of messages;
determining, at each recipient, a unified order for messages delivered using different ones of said plurality of independent communication paths;
wherein said plurality of independent communication paths includes at least two of the group;
a packet network, a shared element, a system area network.
- and including instructions for
-
12. A method as in claim 1, wherein said communication path includes a plurality of independent communication paths between said pair;
- and including steps for
numbering said sequence of messages;
determining, at each recipient, a unified order for messages delivered using different ones of said plurality of independent commuication paths;
wherein said steps for numbering include (a) determing a generation number in response to a service interruption and persistent memory for a sender of said message, and (b) providing said generation number in substantialy each message in said sequence.
- and including steps for
Specification