Storage system having error detection and recovery
First Claim
1. A storage system comprising:
- a plurality of clusters storing data elements from a plurality of clients, wherein each cluster comprises a plurality of storage servers;
a storage monitor communicatively coupled to the clusters for detecting when one of the storage servers fails;
a storage manager communicatively coupled to the storage monitor, wherein the storage monitor informs the storage manager when one of the storage servers fails; and
a partition master communicatively coupled to the plurality of clusters to assign each client a storage partition within one of the clusters, wherein the storage manager commands one of the storage servers to operate as the partition master in the event that the partition master fails.
2 Assignments
0 Petitions
Accused Products
Abstract
A massively scalable architecture for providing a self-monitoring and self-correcting storage system that is capable of handling hundreds of millions of users and tens of billions of files. The system includes one or more clusters storing data elements that are received from a plurality of clients. Each cluster comprises a plurality of storage servers. The storage system facilitates the addition of new storage servers, and the fast recovery of failed storage servers, by logging system transactions in multiple journals of different lengths. When a storage server fails, a cluster backup determines the time of failure and replays one of the journals in order to bring the failed storage server up to date.
275 Citations
13 Claims
-
1. A storage system comprising:
-
a plurality of clusters storing data elements from a plurality of clients, wherein each cluster comprises a plurality of storage servers;
a storage monitor communicatively coupled to the clusters for detecting when one of the storage servers fails;
a storage manager communicatively coupled to the storage monitor, wherein the storage monitor informs the storage manager when one of the storage servers fails; and
a partition master communicatively coupled to the plurality of clusters to assign each client a storage partition within one of the clusters, wherein the storage manager commands one of the storage servers to operate as the partition master in the event that the partition master fails.
-
-
2. A storage system comprising:
-
a plurality of clusters storing data elements from a plurality of clients, wherein each cluster comprises a plurality of storage servers;
a storage monitor communicatively coupled to the clusters for detecting when one of the storage servers fails;
a storage manager communicatively coupled to the storage monitor, wherein the storage monitor informs the storage manager when one of the storage servers fails; and
a write master included in each cluster to receive the data elements from the clients and to direct the storage servers to store the received data elements, wherein the storage manager commands one of the storage servers to operate as the write master in the event that the write master fails.
-
-
3. A storage system comprising:
-
a plurality of clusters storing data elements from a plurality of clients, wherein each cluster comprises a plurality of storage servers, and wherein the data elements are replicated on each of the plurality of storage servers in a given cluster;
a storage monitor communicatively coupled to the clusters for detecting when one of the storage servers fails;
a storage manager communicatively coupled to the storage monitor, wherein the storage monitor informs the storage manager when one of the storage servers fails; and
wherein each cluster includes a cluster backup that records requests to store the data elements stored by the storage servers of the respective cluster. - View Dependent Claims (4, 5, 6, 7)
-
-
8. A computing method comprising:
-
receiving client requests to store data elements in a storage system having a plurality of storage clusters, wherein each storage cluster has a plurality of storage servers;
storing the data elements in each of the storage servers of one of the storage clusters, wherein storing the data elements includes assigning each client a storage partition within one of the clusters;
monitoring the storage servers to detect when a storage server fails; and
promoting one of the storage servers to perform services of the failed storage server, wherein promoting one of the storage servers includes promoting one of the storage servers to assign each client a storage partition within one of the clusters.
-
-
9. A computing method comprising:
-
receiving client requests to store data elements in a storage system having a plurality of storage clusters, wherein each storage cluster has a plurality of storage servers, and wherein the data elements are replicated on each of the plurality of storage servers in a given storage cluster;
storing the data elements in each of the storage servers of one of the storage clusters;
monitoring the storage servers to detect when a storage server fails; and
wherein receiving a client request includes recording the request in at least one journal. - View Dependent Claims (10, 11)
-
-
12. A computer-readable medium having computer-executable instructions for storing information in a storage system having error detection and recovery comprising:
-
receiving client requests to store data elements in a storage system having a plurality of storage clusters, wherein each storage cluster has a plurality of storage servers;
storing the data elements in each of the storage servers of one of the storage clusters, wherein storing the data elements includes assigning each client a storage partition within one of the clusters;
monitoring the storage servers to detect when a storage server fails; and
promoting one of the storage servers to perform services of the failed storage server, wherein promoting one of the storage servers includes promoting one of the storage servers to assign each client a storage partition within one of the clusters.
-
-
13. A computer-readable medium having computer-executable instructions for storing information in a storage system having error detection and recovery comprising:
-
receiving client requests to store data elements in a storage system having a plurality of storage clusters, wherein each storage cluster has a plurality of storage servers, and wherein the data elements are replicated on each of the plurality of storage servers in a given storage cluster;
storing the data elements in each of the storage servers of one of the storage clusters;
monitoring the storage servers to detect when a storage server fails; and
wherein receiving a client request includes recording the request in at least one journal.
-
Specification