Geographically-distributed file system using coordinated namespace replication over a wide area network
First Claim
1. A cluster of nodes comprising computing devices configured to implement a single geographically-distributed file system, the cluster comprising:
- a first data center, comprising;
a plurality of first DataNode computing devices, each configured to store data blocks of client files;
a plurality of first local persistent storages;
a plurality of first NameNode computing devices, each configured to update a state of a namespace of the cluster and each configured to store the updated state of the namespace in a first local persistent storage of the plurality of first local persistent storages;
a second data center that is geographically remote from and coupled to the first data center by a wide area network, the second data center comprising;
a plurality of second DataNode computing devices, each configured to store data blocks of client files;
a plurality of second local persistent storages;
a plurality of second NameNode computing devices, each configured to update the state of the namespace of the cluster and each configured to store the updated state of the namespace in a second local persistent storage of the plurality of second local persistent storages;
wherein the plurality of first and second NameNode computing devices are configured to update the state of the namespace responsive to data blocks being written to the plurality of first and second DataNode computing devices; and
a coordination engine process spanning the plurality of first NameNode computing devices and the plurality of second NameNode computing devices, the coordination engine process being configured to coordinate updates to the state of the namespace stored by the plurality of first and second NameNode computing devices such that the state of the namespace is maintained consistent across the first and second data centers of the cluster,wherein the coordination engine process is configured to receive proposals from the first and second plurality of NameNode computing devices to update the state of the namespace and to generate, in response, an ordered set of agreements that specifies an order in which the plurality of first and second NameNode computing devices are to update their respective stored state of the namespace, and wherein the plurality of first and second NameNode computing devices are configured to delay updates to the state of the namespace until the ordered set of agreements is received from the coordination engine process.
3 Assignments
0 Petitions
Accused Products
Abstract
A cluster of nodes implements a single distributed file system comprises at least first and second data centers and a coordination engine process. The first data center may comprise first DataNodes configured to store data blocks of client files, and first NameNodes configured to update a state of a namespace of the cluster. The second data center, geographically remote from and coupled to the first data center by a wide area network, may comprise second DataNodes configured to store data blocks of client files, and second NameNodes configured to update the state of the namespace. The first and second NameNodes are configured to update the state of the namespace responsive to data blocks being written to the DataNodes. The coordination engine process spans the first and second NameNodes and coordinates updates to the namespace stored such that the state thereof is maintained consistent across the first and second data centers.
-
Citations
39 Claims
-
1. A cluster of nodes comprising computing devices configured to implement a single geographically-distributed file system, the cluster comprising:
-
a first data center, comprising; a plurality of first DataNode computing devices, each configured to store data blocks of client files; a plurality of first local persistent storages; a plurality of first NameNode computing devices, each configured to update a state of a namespace of the cluster and each configured to store the updated state of the namespace in a first local persistent storage of the plurality of first local persistent storages; a second data center that is geographically remote from and coupled to the first data center by a wide area network, the second data center comprising; a plurality of second DataNode computing devices, each configured to store data blocks of client files; a plurality of second local persistent storages; a plurality of second NameNode computing devices, each configured to update the state of the namespace of the cluster and each configured to store the updated state of the namespace in a second local persistent storage of the plurality of second local persistent storages; wherein the plurality of first and second NameNode computing devices are configured to update the state of the namespace responsive to data blocks being written to the plurality of first and second DataNode computing devices; and a coordination engine process spanning the plurality of first NameNode computing devices and the plurality of second NameNode computing devices, the coordination engine process being configured to coordinate updates to the state of the namespace stored by the plurality of first and second NameNode computing devices such that the state of the namespace is maintained consistent across the first and second data centers of the cluster, wherein the coordination engine process is configured to receive proposals from the first and second plurality of NameNode computing devices to update the state of the namespace and to generate, in response, an ordered set of agreements that specifies an order in which the plurality of first and second NameNode computing devices are to update their respective stored state of the namespace, and wherein the plurality of first and second NameNode computing devices are configured to delay updates to the state of the namespace until the ordered set of agreements is received from the coordination engine process. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer-implemented method, comprising:
-
establishing a single distributed file system that spans, over a wide area network, a cluster comprising a first data center and a geographically remote second data center, the first data center comprising a plurality of first NameNode computing devices and a plurality of first DataNode computing devices configured to store data blocks of client files, the second data center comprising a plurality of second NameNode computing devices and a plurality of second DataNode computing devices configured to store data blocks of client files; storing, in local persistent storage accessible to each of the plurality of first NameNode computing devices and in each of the plurality of second NameNode computing devices, a state of a namespace of the cluster; receiving proposals from the plurality of first and second NameNode computing devices to update the state of the namespace and generating, in response, an ordered set of agreements that specifies an order in which the plurality of first and second NameNode computing devices are to update their respective stored state of the namespace; delaying, by the plurality of first and second NameNode computing devices, making updates to the state of the namespace until the ordered set of agreements is received; updating, in the local persistent storage, the state of the namespace stored in the plurality of first NameNode computing devices and in the plurality of second NameNode computing devices, responsive to data blocks being written to the plurality of first and second DataNode computing devices; and coordinating updates to the state of the namespace stored in the plurality of first NameNode computing devices and stored in the plurality of second NameNode computing devices to maintain the state of the namespace consistent across the first and second data centers of the cluster. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39)
-
Specification