Map-reduce ready distributed file system
First Claim
1. A computer implemented map reduce method comprising:
- maintaining information about where each of a plurality of containers is located in a container location database (CLDB), wherein a storage pool contains zero or more containers, wherein a cluster node in a plurality of cluster nodes contains one or more storage pools;
structuring data within said containers using a plurality of inodes;
replicating said containers to other cluster nodes with one container designated as master for each replication chain;
storing data in said CTDB as modes in well known containers;
maintaining a database in CLDB nodes, wherein said database contains at least following information about all of said containers;
nodes that have replicas of a container,an ordering of a replication chain for each container,wherein updates to said container are sent to a master container for said updated container, and wherein changes to content of said container are propagated to said replicas of said container by said master container;
storing in said CLDB a location of all replicas of said container, a structure of a replication for said container, and an epoch number for each container, wherein said epoch number is incremented each time said structure of said replication for said container is changed, and wherein an epoch'"'"'s changes are noted in a transaction history for each version of said container and gaps are inserted whenever said master container is noted in said replication chain; and
tracing back through transactions that have been applied to each copy when examining said master container and target copies of a same container to determine a point in a history of two containers when the two containers were identical.
7 Assignments
0 Petitions
Accused Products
Abstract
A map-reduce compatible distributed file system that consists of successive component layers that each provide the basis on which the next layer is built provides transactional read-write-update semantics with file chunk replication and huge file-create rates. Containers provide the fundamental basis for data replication, relocation, and transactional updates. A container location database allows containers to be found among all file servers, as well as defining precedence among replicas of containers to organize transactional updates of container contents. Volumes facilitate control of data placement, creation of snapshots and mirrors, and retention of a variety of control and policy information. Also addressed is the use of distributed transactions in a map-reduce system; the use of local and distributed snapshots; replication, including techniques for reconciling the divergence of replicated data after a crash; and mirroring.
-
Citations
4 Claims
-
1. A computer implemented map reduce method comprising:
-
maintaining information about where each of a plurality of containers is located in a container location database (CLDB), wherein a storage pool contains zero or more containers, wherein a cluster node in a plurality of cluster nodes contains one or more storage pools; structuring data within said containers using a plurality of inodes; replicating said containers to other cluster nodes with one container designated as master for each replication chain; storing data in said CTDB as modes in well known containers; maintaining a database in CLDB nodes, wherein said database contains at least following information about all of said containers; nodes that have replicas of a container, an ordering of a replication chain for each container, wherein updates to said container are sent to a master container for said updated container, and wherein changes to content of said container are propagated to said replicas of said container by said master container; storing in said CLDB a location of all replicas of said container, a structure of a replication for said container, and an epoch number for each container, wherein said epoch number is incremented each time said structure of said replication for said container is changed, and wherein an epoch'"'"'s changes are noted in a transaction history for each version of said container and gaps are inserted whenever said master container is noted in said replication chain; and tracing back through transactions that have been applied to each copy when examining said master container and target copies of a same container to determine a point in a history of two containers when the two containers were identical. - View Dependent Claims (2, 3, 4)
-
Specification