Map-reduce ready distributed file system
First Claim
1. A distributed file system comprising:
- a processor, said processor implementing a plurality of storage pools that bind raw block stores together and that provide a storage mechanism for containers and transaction logs;
said processor implementing a plurality of containers configured for any of data replication, relocation, and transactional updates; and
a container location database configured to locate specific containers within a plurality of file servers, and with which precedence among replicas of containers is defined to organize transactional updates of container contents;
wherein each said storage pool comprises a plurality of bitmap extents, a plurality of log extents, and a map of container id (CID) to container disk offset, each of which is stored in a super block that is replicated to several well-known locations in the storage pool;
wherein said bitmap extents comprise pointers to multiple block allocation bitmaps for the storage pool;
wherein said log extents comprise pointers to portions of the storage pool that are used to store transaction logs for the storage pool; and
wherein said map of container id (CID) to disk offsets comprises a mechanism for looking up container IDs to find disk offsets in the storage pool.
8 Assignments
0 Petitions
Accused Products
Abstract
A map-reduce compatible distributed file system that consists of successive component layers that each provide the basis on which the next layer is built provides transactional read-write-update semantics with file chunk replication and huge file-create rates. A primitive storage layer (storage pools) knits together raw block stores and provides a storage mechanism for containers and transaction logs. Storage pools are manipulated by individual file servers. Containers provide the fundamental basis for data replication, relocation, and transactional updates. A container location database allows containers to be found among all file servers, as well as defining precedence among replicas of containers to organize transactional updates of container contents. Volumes facilitate control of data placement, creation of snapshots and mirrors, and retention of a variety of control and policy information. Key-value stores relate keys to data for such purposes as directories, container location maps, and offset maps in compressed files.
-
Citations
67 Claims
-
1. A distributed file system comprising:
-
a processor, said processor implementing a plurality of storage pools that bind raw block stores together and that provide a storage mechanism for containers and transaction logs; said processor implementing a plurality of containers configured for any of data replication, relocation, and transactional updates; and a container location database configured to locate specific containers within a plurality of file servers, and with which precedence among replicas of containers is defined to organize transactional updates of container contents; wherein each said storage pool comprises a plurality of bitmap extents, a plurality of log extents, and a map of container id (CID) to container disk offset, each of which is stored in a super block that is replicated to several well-known locations in the storage pool; wherein said bitmap extents comprise pointers to multiple block allocation bitmaps for the storage pool; wherein said log extents comprise pointers to portions of the storage pool that are used to store transaction logs for the storage pool; and wherein said map of container id (CID) to disk offsets comprises a mechanism for looking up container IDs to find disk offsets in the storage pool.
-
-
2. A distributed file system, comprising:
-
a processor, said processor implementing a plurality of containers in which each container stores file and directory meta-data as well as file content data; wherein references to file content data are stored on a subset of nodes on which container meta-data and data are stored; a container location database (CLDB) configured to maintain information about where each of said plurality of containers is located; a plurality of cluster nodes, each cluster node containing one or more storage pools, each storage pool containing zero or more containers; a plurality of inodes for structuring data within said containers; wherein said CLDB is configured to assign nodes as replicas of data in a container to meet policy constraints; wherein each said storage pool comprises a plurality of bitmap extents, a plurality of log extents, and a map of container id (CID) to container disk offset, each of which is stored in a super block that is replicated to several well-known locations in the storage pool; wherein said bitmap extents comprise pointers to multiple block allocation bitmaps for the storage pool; wherein said log extents comprise pointers to portions of the storage pool that are used to store transaction logs for the storage pool; and wherein said map of container id (CID) to disk offsets comprises a mechanism for looking up container IDs to find disk offsets in the storage pool. - View Dependent Claims (3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21)
-
-
22. A distributed file system, comprising:
-
a processor, said processor implementing a plurality of containers in which each container stores file and directory meta-data as well as file content data; wherein references to file content data are stored on a subset of nodes on which container meta-data and data are stored; a container location database (CLDB) configured to maintain information about where each of said plurality of containers is located; a plurality of cluster nodes, each cluster node containing one or more storage pools, each storage pool containing zero or more containers; and a plurality of inodes for structuring data within said containers; wherein said CLDB is configured to assign nodes as replicas of data in a container to meet policy constraints; each inode further comprising a composite data structure that contains attributes that describe various aspects of each object including any of owner, permissions, parent file identifier (FID), object type, and size; wherein object type comprises any of a local file, chunked file, directory, key-value store, symbolic link, or volume mount point; wherein said inode further comprises pointers to disk blocks that contain a first set of bytes of data in the object; wherein each of said pointers comprises an associated copy-on-write bit stored with said pointers; wherein said inode further comprises references to indirect data which, in the case of local files can also comprise a pointer to a B+ tree that contains the object data, along with a copy-on-write bit for that tree and, in the case of a chunked file, a pointer to a local file, referred to as a FID map, that contains FID'"'"'s that refer to local files in other containers containing content of the file; wherein said inode further comprises a cache of a latest version number for any structure referenced from the inode; and wherein said version number is configured for use in replication and mirroring. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
-
-
45. A distributed file system, comprising:
-
a processor, said processor implementing a plurality of containers in which each container stores file and directory meta-data as well as file content data; wherein references to file content data are stored on a subset of nodes on which container meta-data and data are stored; a container location database (CLDB) configured to maintain information about where each of said plurality of containers is located; a plurality of cluster nodes, each cluster node containing one or more storage pools, each storage pool containing zero or more containers; a plurality of inodes for structuring data within said containers wherein said CLDB is configured to assign nodes as replicas of data in a container to meet policy constraints; and wherein said distributed file system is configured for stateless access; a plurality of NFS gateways; wherein said distributed file system is configured for access via NFS network protocols; and a coordination server by which said NFS gateways cooperatively decide which of said NFS gateways host which IP addresses; wherein all file names accessed via the distributed file system start with a common prefix followed by a cluster name and a name of a file within said cluster; and wherein said NFS gateways are configured to populate a top-level virtual directory associated with said common prefix with virtual files corresponding to each accessible cluster. - View Dependent Claims (46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65)
-
-
66. A distributed file system comprising:
-
a processor, said processor implementing a plurality of storage pools that bind raw block stores together and that provide a storage mechanism for containers and transaction logs; said processor implementing a plurality of containers configured for any of data replication, relocation, and transactional updates; and a container location database configured to locate specific containers within a plurality of file servers, and with which precedence among replicas of containers is defined to organize transactional updates of container contents; a plurality of inodes for structuring data within said containers, each inode further comprising a composite data structure that contains attributes that describe various aspects of each object including any of owner, permissions, parent container id (CID), object type, and size; wherein object type comprises any of a local file, chunked file, directory, key-value store, symbolic link, or volume mount point; wherein said inode further comprises pointers to disk blocks that contain a first set of bytes of data in the object; wherein each of said pointers comprises an associated copy-on-write bit stored with said pointers; wherein said inode further comprises references to indirect data which, in the case of local files can also comprise a pointer to a B+ tree that contains the object data, along with a copy-on-write bit for that tree and, in the case of a chunked file, a pointer to a local file, referred to as a FID map, that contains FID'"'"'s that refer to local files in other containers containing content of the file; wherein said inode further comprises a cache of a latest version number for any structure referenced from the inode; and wherein said version number is configured for use in replication and mirroring.
-
-
67. A distributed file system comprising:
-
a processor, said processor implementing a plurality of storage pools that bind raw block stores together and that provide a storage mechanism for containers and transaction logs; said processor implementing a plurality of containers configured for any of data replication, relocation, and transactional updates; and a container location database configured to locate specific containers within a plurality of file servers, and with which precedence among replicas of containers is defined to organize transactional updates of container contents; wherein said distributed file system is configured for stateless access; a plurality of NFS gateways; wherein said distributed file system is configured for access via NFS network protocols; and a coordination server by which said NFS gateways cooperatively decide which of said NFS gateways host which IP addresses; wherein all file names accessed via the distributed file system start with a common prefix followed by a cluster name and a name of a file within said cluster; and wherein said NFS gateways are configured to populate a top-level virtual directory associated with said common prefix with virtual files corresponding to each accessible cluster.
-
Specification