Map-reduce ready distributed file system
First Claim
1. A system to avoid data loss comprising:
- a plurality of cluster nodes storing a plurality of containers comprising file system objects;
a plurality of replicated containers, each replicated container in the plurality of replicated containers comprising a copy of a container in the plurality of containers, the replicated container stored on a first cluster node in the plurality of cluster nodes different from a second cluster node in the plurality of cluster nodes storing the container; and
a replication chain associated with each container in the plurality of containers to avoid data loss, the replication chain comprising a master container and a slave container, the master container being an initial container in the replication chain, wherein the replication chain for the container is changed if a cluster node holding the replicated container is taken out of service, or if the cluster node holding the replicated container returns to service, wherein the master container receives an update to the slave container and propagates the update to the replication chain;
a storage pool including;
a block allocation bitmap indicating which blocks in a disk are in use;
a transaction log comprising a list of pointers to disk regions that hold log data;
a container map comprising a mapping from a container identification (id) to a container specification of the container in the storage pool; and
a super-block containing offsets to starting points of the block allocation bitmap, the transaction log, and the container map.
7 Assignments
0 Petitions
Accused Products
Abstract
A map-reduce compatible distributed file system that consists of successive component layers that each provide the basis on which the next layer is built provides transactional read-write-update semantics with file chunk replication and huge file-create rates. Containers provide the fundamental basis for data replication, relocation, and transactional updates. A container location database allows containers to be found among all file servers, as well as defining precedence among replicas of containers to organize transactional updates of container contents. Volumes facilitate control of data placement, creation of snapshots and mirrors, and retention of a variety of control and policy information. Also addressed is the use of distributed transactions in a map-reduce system; the use of local and distributed snapshots; replication, including techniques for reconciling the divergence of replicated data after a crash; and mirroring.
-
Citations
18 Claims
-
1. A system to avoid data loss comprising:
-
a plurality of cluster nodes storing a plurality of containers comprising file system objects; a plurality of replicated containers, each replicated container in the plurality of replicated containers comprising a copy of a container in the plurality of containers, the replicated container stored on a first cluster node in the plurality of cluster nodes different from a second cluster node in the plurality of cluster nodes storing the container; and a replication chain associated with each container in the plurality of containers to avoid data loss, the replication chain comprising a master container and a slave container, the master container being an initial container in the replication chain, wherein the replication chain for the container is changed if a cluster node holding the replicated container is taken out of service, or if the cluster node holding the replicated container returns to service, wherein the master container receives an update to the slave container and propagates the update to the replication chain; a storage pool including; a block allocation bitmap indicating which blocks in a disk are in use; a transaction log comprising a list of pointers to disk regions that hold log data; a container map comprising a mapping from a container identification (id) to a container specification of the container in the storage pool; and a super-block containing offsets to starting points of the block allocation bitmap, the transaction log, and the container map. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method comprising:
-
configuring a plurality of cluster nodes to store a plurality of containers including file system objects; configuring a plurality of replicated containers, wherein each replicated container in the plurality of replicated containers includes a copy of a container in the plurality of containers, the replicated container stored on a first cluster node in the plurality of cluster nodes different from a second cluster node in the plurality of cluster nodes storing the container; and configuring a replication chain associated with each container in the plurality of containers to avoid data loss and to include a master container and a slave container, the master container being an initial container in the replication chain, wherein the replication chain for the container is changed if a cluster node holding the replicated container is taken out of service, or if the cluster node holding the replicated container returns to service, wherein the master container receives an update to the slave container and propagates the update to the replication chain; contacting a container location database (CLDB) when the cluster node containing an out-of-date replicated container joins an existing replication chain; when the CLDB determines that a number of replicated containers in the replication chain is sufficient, instructing the cluster node to discard the out-of-date replicated container; when the CLDB determines that the number of replicated containers in the replication chain is insufficient, assigning the out-of-date replicated container to the replication chain; and when the CLDB determines that the number of replicated containers in the replication chain is insufficient, in response to assigning the out-of-date replicated container to the replication chain, resynchronizing the out-of-date replicated container up to a current state. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification