Distributed storage system capable of restoring data in case of a storage failure
First Claim
1. A collective data storage system comprising:
- a plurality of storage nodes connected by a network, each node storing data as extents;
a data service (DS) agent in each node for managing the extents in the node;
a plurality of metadata service (MDS) agents for managing metadata relating to the nodes and the extents, the MDS agents operating in a subset of the nodes;
a cluster manager (CM) agent in each node for detecting a failure in the system and notifying a subset of the DS or MDS agents of the failure,wherein upon notification of the failure, the subset of DS or MDS agents independently generates a plan to restore the extents affected by the failure and collectively restoring the affected extents based on the plan; and
a persistent map that correlates the data extents with the nodes; and
wherein each MDS agent manages a subset of the map.
2 Assignments
0 Petitions
Accused Products
Abstract
A collective storage system and method for restoring data in the system after a failure in the system. The system includes multiple storage nodes that are interconnected by a network and store data as extents. There are also a set of Data Service (DS) agents for managing the extents, a set of Metadata Service (MDS) agents for managing metadata relating to the nodes and the extents, and a Cluster Manager (CM) agent in each node. After a node failure is detected by one of the CM agents, the agents responsible for coordinating the data restoring are notified of the failure. The agents generate a plan to restore the data extents affected by the failure, and then collectively restoring the affected extents based on the generated plan. The coordinating agents might be the MDS agents or DS agents. The failure might be a node failure or a disk failure.
297 Citations
36 Claims
-
1. A collective data storage system comprising:
-
a plurality of storage nodes connected by a network, each node storing data as extents; a data service (DS) agent in each node for managing the extents in the node; a plurality of metadata service (MDS) agents for managing metadata relating to the nodes and the extents, the MDS agents operating in a subset of the nodes; a cluster manager (CM) agent in each node for detecting a failure in the system and notifying a subset of the DS or MDS agents of the failure, wherein upon notification of the failure, the subset of DS or MDS agents independently generates a plan to restore the extents affected by the failure and collectively restoring the affected extents based on the plan; and a persistent map that correlates the data extents with the nodes; and
wherein each MDS agent manages a subset of the map. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method for restoring data in a collective storage system having a plurality of storage nodes interconnected by a network and storing data as extents, Data Service (DS) agents for managing the extents, Metadata Service (MDS) agents for managing metadata relating to the nodes and the extents, and Cluster Manager (CM) agents, the DS agents managing the extents in each node, the method comprising the steps of:
-
detecting a failure in the system by one of the CM agents; notifying a subset of the DS or MDS agents of the failure; generating, by the notified DS or MDS agents independently, a plan to restore the data extents affected by the failure; and collectively restoring the affected extents based on the generated plan;
wherein each node includes a plurality or disk drives;
the detected failure is a disk failure; and
the disk failure is detected by a DS agent based on error rates of the disks. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35)
-
-
36. A computer-program product for restoring data in a collective storage system having a plurality of storage nodes interconnected by a network and storing data as extents, Data Service (DS) agents for managing the extents, Metadata Service (MDS) agents for managing metadata relating to the nodes and the extents, and Cluster Manager (CM) agents, the DS agents managing the extents in each node, the computer-program product comprising a computer readable medium comprising program code for:
-
detecting a failure in the system by one of the CM agents;
notifying a subset of the DS or MDS agents of the failure;generating, by the notified DS or MDS agents independently, a plan to restore the extents affected by the failure; collectively restoring the affected extents based on the generated plan; wherein the generating the restore plan includes the steps of; determining the extents affected by the failure; allocating space in the nodes that are still operational to replace the affected extents; and wherein the restoring includes the steps of; determining data in the affected extents; and transferring the determined data to the allocated space.
-
Specification