Fast primary cluster recovery
First Claim
1. A storage method operating across a set of distributed locations, wherein at each location a redundant array of independent nodes are networked together to provide a cluster, and wherein each node of a cluster executes an instance of an application that provides object-based storage of fixed content data and associated metadata, comprising:
- configuring an association between a first cluster and a second cluster;
replicating the first cluster'"'"'s fixed content data and metadata from the first cluster to the second cluster;
upon a failure associated with the first cluster, redirecting clients of the first cluster to the second cluster; and
upon repair or replacement of the first cluster, having the repaired or replaced first cluster resume authority for servicing the clients of the first cluster upon receipt, from the second cluster, of the metadata, wherein the repaired or replaced first cluster resumes authority for the clients irrespective of whether the fixed content data has been transferred back from the second cluster.
3 Assignments
0 Petitions
Accused Products
Abstract
A cluster recovery process is implemented across a set of distributed archives, where each individual archive is a storage cluster of preferably symmetric nodes. Each node of a cluster typically executes an instance of an application that provides object-based storage of fixed content data and associated metadata. According to the storage method, an association or “link” between a first cluster and a second cluster is first established to facilitate replication. The first cluster is sometimes referred to as a “primary” whereas the “second” cluster is sometimes referred to as a “replica.” Once the link is made, the first cluster'"'"'s fixed content data and metadata are then replicated from the first cluster to the second cluster, preferably in a continuous manner. Upon a failure of the first cluster, however, a failover operation occurs, and clients of the first cluster are redirected to the second cluster. Upon repair or replacement of the first cluster (a “restore”), the repaired or replaced first cluster resumes authority for servicing the clients of the first cluster. This restore operation preferably occurs in two stages: a “fast recovery” stage that involves preferably “bulk” transfer of the first cluster metadata, following by a “fail back” stage that involves the transfer of the fixed content data. Upon receipt of the metadata from the second cluster, the repaired or replaced first cluster resumes authority for the clients irrespective of whether the fail back stage has completed or even begun.
-
Citations
11 Claims
-
1. A storage method operating across a set of distributed locations, wherein at each location a redundant array of independent nodes are networked together to provide a cluster, and wherein each node of a cluster executes an instance of an application that provides object-based storage of fixed content data and associated metadata, comprising:
-
configuring an association between a first cluster and a second cluster; replicating the first cluster'"'"'s fixed content data and metadata from the first cluster to the second cluster; upon a failure associated with the first cluster, redirecting clients of the first cluster to the second cluster; and upon repair or replacement of the first cluster, having the repaired or replaced first cluster resume authority for servicing the clients of the first cluster upon receipt, from the second cluster, of the metadata, wherein the repaired or replaced first cluster resumes authority for the clients irrespective of whether the fixed content data has been transferred back from the second cluster. - View Dependent Claims (2, 3, 4, 5)
-
-
6. In a distributed storage system wherein a first cluster replicates metadata and content data to a second cluster, a method, comprising:
-
upon a failure associated with the first cluster, redirecting clients of the first cluster to the second cluster; and bulk transferring the metadata to a repaired or replaced first cluster in advance of any transfer of the content data, the repaired or replaced first cluster resuming authority for servicing the clients of the first cluster as soon as the bulk transfer of the metadata is completed. - View Dependent Claims (7, 8, 9, 10, 11)
-
Specification