Map-reduce ready distributed file system
First Claim
1. A map-reduce compatible distributed file system comprising replicated containers preventing data loss comprising:
- a container location database (CLDB) configured to maintain information about where each of a plurality of containers is located;
a plurality of cluster nodes, each cluster node containing one or more storage pools, each storage pool containing zero or more containers; and
a plurality of inodes for structuring said file system objects within said containers;
wherein said containers comprise file system objects, and said replicated containers preventing data loss comprise said containers replicated to other cluster nodes with one container designated as master container for each replication chain controlling transactions for said replication chain, said replication chain arranged in a linear pattern, a star pattern, or any combination of said linear and said star pattern, wherein said replication chain for said container is changed if a node holding any replica fails or is taken out of service, or if a node that previously contained a replica returns to service;
wherein said maintained information about where each of said plurality of containers is located that is maintained in said CLDB is stored as inodes in containers;
wherein said CLDB inodes are configured to maintain a database that contains at least following information about all of said containers;
nodes that have replicas of a container; and
an ordering of said replication chain for said container;
wherein updates to said container are sent to said master container for said updated container;
wherein changes to content of said container are propagated to said replicas of said container by said master container;
wherein some file system objects are larger than a single container; and
wherein some file system objects are spread over a larger number of nodes than a set represented by said replication chain of a single container.
7 Assignments
0 Petitions
Accused Products
Abstract
A map-reduce compatible distributed file system that consists of successive component layers that each provide the basis on which the next layer is built provides transactional read-write-update semantics with file chunk replication and huge file-create rates. Containers provide the fundamental basis for data replication, relocation, and transactional updates. A container location database allows containers to be found among all file servers, as well as defining precedence among replicas of containers to organize transactional updates of container contents. Volumes facilitate control of data placement, creation of snapshots and mirrors, and retention of a variety of control and policy information. Also addressed is the use of distributed transactions in a map-reduce system; the use of local and distributed snapshots; replication, including techniques for reconciling the divergence of replicated data after a crash; and mirroring.
-
Citations
15 Claims
-
1. A map-reduce compatible distributed file system comprising replicated containers preventing data loss comprising:
-
a container location database (CLDB) configured to maintain information about where each of a plurality of containers is located; a plurality of cluster nodes, each cluster node containing one or more storage pools, each storage pool containing zero or more containers; and a plurality of inodes for structuring said file system objects within said containers;
wherein said containers comprise file system objects, and said replicated containers preventing data loss comprise said containers replicated to other cluster nodes with one container designated as master container for each replication chain controlling transactions for said replication chain, said replication chain arranged in a linear pattern, a star pattern, or any combination of said linear and said star pattern, wherein said replication chain for said container is changed if a node holding any replica fails or is taken out of service, or if a node that previously contained a replica returns to service;wherein said maintained information about where each of said plurality of containers is located that is maintained in said CLDB is stored as inodes in containers; wherein said CLDB inodes are configured to maintain a database that contains at least following information about all of said containers; nodes that have replicas of a container; and an ordering of said replication chain for said container; wherein updates to said container are sent to said master container for said updated container; wherein changes to content of said container are propagated to said replicas of said container by said master container; wherein some file system objects are larger than a single container; and wherein some file system objects are spread over a larger number of nodes than a set represented by said replication chain of a single container. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
Specification