Enhanced coordinated cluster recovery
First Claim
Patent Images
1. A method comprising:
- operating a storage server in a storage server cluster; and
preventing a cluster partner in the storage server cluster from accessing and serving data from taken-over storage devices associated with the cluster partner when the cluster partner is taken over by the storage server while allowing early release of reservations on the taken-over storage devices associated with the cluster partner before control is given back to the cluster partner,wherein said preventing comprises sending a takeover state signal representative of the takeover state of the storage server to notify the cluster partner that the cluster partner has been taken over,wherein the takeover state signal stops the cluster partner from serving the data when the cluster partner does not discover contents of a clustering disk area including clustering information and takeover state information stored on the taken-over storage devices,wherein the storage server and cluster partner are each physically connected to the taken-over storage devices and the reservations are configured to indicate that the storage server owns the control of the taken-over storage devices until the release of the reservations.
1 Assignment
0 Petitions
Accused Products
Abstract
An apparatus and a method that prevent a split-brain problem by preventing a cluster partner from accessing and serving data when the cluster partner is taken over by a storage server, while allowing early release of reservations on the cluster partner'"'"'s storage devices before control is given back to the cluster partner.
62 Citations
18 Claims
-
1. A method comprising:
-
operating a storage server in a storage server cluster; and preventing a cluster partner in the storage server cluster from accessing and serving data from taken-over storage devices associated with the cluster partner when the cluster partner is taken over by the storage server while allowing early release of reservations on the taken-over storage devices associated with the cluster partner before control is given back to the cluster partner, wherein said preventing comprises sending a takeover state signal representative of the takeover state of the storage server to notify the cluster partner that the cluster partner has been taken over, wherein the takeover state signal stops the cluster partner from serving the data when the cluster partner does not discover contents of a clustering disk area including clustering information and takeover state information stored on the taken-over storage devices, wherein the storage server and cluster partner are each physically connected to the taken-over storage devices and the reservations are configured to indicate that the storage server owns the control of the taken-over storage devices until the release of the reservations. - View Dependent Claims (2, 3)
-
-
4. A method comprising:
-
initializing a boot process of a storage server in a failover or reboot context; discovering disk reservations on an array of storage devices, the disk reservations are configured to indicate that a cluster partner owns the control of the array of storage devices until the release of the reservations; in response to the cluster partner releasing the disk reservations early, searching for contents of a clustering disk area including clustering information and takeover state information; receiving, at the storage server, takeover data to indicate that the storage server is in a taken-over state; and determining that the storage server has been taken over by the cluster partner using the contents of the clustering disk area when discovered; and determining that the storage server has been taken over by the cluster partner using the received takeover data when the contents of the clustering disk area are not discovered, wherein said determining stops the storage server from serving the data from the array of storage devices taken over by the cluster partner while allowing the early release of disk reservations before control is given back to the storage server. - View Dependent Claims (5, 6, 7)
-
-
8. A system, comprising:
-
a first storage server physically coupled to communicate with a first array of storage devices and a client; and a second storage server coupled to the first storage server by a cluster interconnect, wherein the second storage server is physically coupled to communicate with a second array of storage devices and is physically coupled to the first array of storage devices associated with the first storage server, wherein the first storage server is configured to; initialize a boot process in a failover or reboot context; discover disk reservations on the first array of storage devices, the disk reservations are configured to indicate that the second server owns the control of the first array of storage devices until the release of the reservations; in response to the second storage server releasing the disk reservations early, search for contents of a clustering disk area including clustering information and takeover state information; receive takeover data to indicate that the first storage server is in a taken-over state; determine that the first storage server is in a taken-over state by the second storage server using the contents of the clustering disk area when discovered; and determine that the first storage server is in the taken-over state by the second storage server using the received takeover data when the contents of the clustering disk area are not discovered, wherein the takeover data is configured to stop the first storage server from serving the data from the first array storage taken over by the second storage server while allowing the early release of disk reservations before control is given back to the first storage server. - View Dependent Claims (9, 10)
-
-
11. A server, comprising:
-
a processor; a communication interface through which to communicate with a client of the server; a second communication interface through which to communicate with taken-over storage devices associated with a cluster partner; a cluster interconnect adapter to enable the server to communicate with the cluster partner over a cluster interconnect; and a memory storing instructions which configure the processor to put the server in a takeover state when the cluster partner has failed and to prevent the cluster partner from booting to a point at which the cluster partner serves data, by sending takeover data to the cluster partner to indicate that the server is in the takeover state, and wherein the instructions further configure the processor to allow early release of reservations on the taken-over storage devices associated with the cluster partner before control is given back to the cluster partner, wherein the server and cluster partner are each physically connected to the taken-over-storage devices and the reservations are configured to indicate that the server owns the control of the taken-over storage devices until the release of the reservations, and wherein the takeover data stops the cluster partner from serving the data when the cluster partner does not discover contents of a clustering disk area including clustering information and takeover state information. - View Dependent Claims (12, 13)
-
-
14. A server, comprising:
-
a processor; a network adapter coupled to the processor, through which to receive client requests from a client over a network; a cluster interconnect adapter to enable the server to communicate with a cluster partner; and a memory storing instructions which configure the processor to respond to receiving takeover data indicating that the server is in a taken-over state to prevent the server from booting to a point at which the server services the client requests in a failover or reboot context, wherein the instructions further configure the processor to initialize a boot process in a failover or reboot context, to discover disk reservations on taken-over resources associated with the server, wherein the reservations are configured to indicate that the server owns the control of the taken-over resources until the release of the reservations, and wherein the instructions further configure the processor to receive the control of the taken-over resources from the cluster partner after the release of reservations, to search for contents of a clustering disk area including clustering information and takeover state information, to determine that the storage server has been taken over by the cluster partner using the contents of the clustering disk area when discovered and using the received takeover data when the contents of the clustering disk area are not discovered, and to continue to boot as part of the boot process in response to receiving the control of the taken-over resources without user intervention at the server. - View Dependent Claims (15)
-
-
16. A method, comprising:
-
initializing a boot process of a server; discovering disk reservations on taken-over resources associated with the server, wherein the disk reservations are configured to indicate that the server owns the control of the taken-over resources until the release of the reservations; receiving, at the server, takeover data to indicate that the server is in a taken-over state by the cluster partner; searching for contents of a clustering disk area including clustering information and takeover state information; determining that the server is in the taken-over state by the cluster partner using the contents of the clustering disk area when discovered; determining that the server is in the taken-over state by the cluster partner using the received takeover data when the contents of the clustering disk area are not discovered; preventing the server from continuing to boot as part of the boot process in response to determining that the server is in the taken-over state; receiving the control of the taken-over resources associated with the server from the cluster partner; and continuing to boot as part of the boot process in response to receiving the control of the taken-over resources without user intervention at the server. - View Dependent Claims (17, 18)
-
Specification