Consistent data storage in distributed computing systems
First Claim
1. A distributed computing system, comprising:
- one or more hardware compute nodes configured to process a data set of data objects, wherein the data objects are stored using a first hardware computer system and a second hardware computer system that are accessible over one or more networks, wherein the processing of the data set includes generating a request to modify at least one of the data objects stored using the first and second hardware computer systems;
the first hardware computer system, configured to implement a distributed computing file system (DCFS) using an unstructured object storage service, wherein the DCFS stores the data objects of the data set as files using the unstructured object storage service, wherein the unstructured object storage service implements a first client-facing interface accessible to a plurality of clients, and wherein the unstructured data storage service is not guaranteed to return a latest version of the data objects produced by the request via the first client-facing interface;
the second hardware computer system, configured to implement a DCFS directory for the DCFS using a multi-tenant database service, wherein the DCFS directory stores metadata for the data objects of the DCFS, wherein the multi-tenant database service implements a second client-facing interface accessible to the plurality of clients, and wherein the multi-tenant database service is guaranteed to return a latest version of the metadata of the data objects produced by the request via the second client-facing interface;
wherein the DCFS directory is an authoritative store and source for directory information of the DCFS during processing of the data set by the one or more hardware compute nodes.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and apparatus for providing consistent data storage in distributed computing systems. A consistent distributed computing file system (consistent DCFS) may be backed by an object storage service that only guarantees eventual consistency, and may leverage a data storage service (e.g., a database service) to store and maintain a file system/directory structure (a consistent DCFS directory) for the consistent DCFS that may be accessed by compute nodes for file/directory information relevant to the data objects in the consistent DCFS, rather than relying on the information maintained by the object storage service. The compute nodes may reference the consistent DCFS directory to, for example, store and retrieve strongly consistent metadata referencing data objects in the consistent DCFS. The compute nodes may, for example, retrieve metadata from consistent DCFS directory to determine whether the object storage service is presenting all of the data that it is supposed to have.
47 Citations
24 Claims
-
1. A distributed computing system, comprising:
-
one or more hardware compute nodes configured to process a data set of data objects, wherein the data objects are stored using a first hardware computer system and a second hardware computer system that are accessible over one or more networks, wherein the processing of the data set includes generating a request to modify at least one of the data objects stored using the first and second hardware computer systems; the first hardware computer system, configured to implement a distributed computing file system (DCFS) using an unstructured object storage service, wherein the DCFS stores the data objects of the data set as files using the unstructured object storage service, wherein the unstructured object storage service implements a first client-facing interface accessible to a plurality of clients, and wherein the unstructured data storage service is not guaranteed to return a latest version of the data objects produced by the request via the first client-facing interface; the second hardware computer system, configured to implement a DCFS directory for the DCFS using a multi-tenant database service, wherein the DCFS directory stores metadata for the data objects of the DCFS, wherein the multi-tenant database service implements a second client-facing interface accessible to the plurality of clients, and wherein the multi-tenant database service is guaranteed to return a latest version of the metadata of the data objects produced by the request via the second client-facing interface; wherein the DCFS directory is an authoritative store and source for directory information of the DCFS during processing of the data set by the one or more hardware compute nodes. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A method, comprising:
-
establishing a distributed computing file system (DCFS) on one or more storage devices to store data objects in an unstructured object storage service, wherein the unstructured object storage service is implemented on a first hardware computer system accessible over one or more networks and implements a first client-facing interface accessible to a plurality of clients; establishing a DCFS directory to store metadata for the data objects in the DCFS in a multi-tenant database service different from the unstructured object storage service, wherein the multi-tenant database service is implemented on a second hardware computer system accessible over the one or more networks and implements a second client-facing interface accessible to the plurality of clients; responsive to a request to modify a data object of the data objects, modifying the data object in the unstructured object storage service and metadata of the data object in the multi-tenant database service, wherein the unstructured data storage service is not guaranteed to return a latest version of the data object produced by the request via the first client-facing interface, and the multi-tenant database service is guaranteed to return a latest version of the metadata of the data object produced by the request via the second client-facing interface; and accessing, by one or more compute nodes distinct from the first and second hardware computer systems, the DCFS directory to obtain the metadata for accessing the data objects stored in the DCFS. - View Dependent Claims (8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A non-transitory computer-accessible storage medium storing program instructions computer-executable to implement:
-
establishing a distributed computing file system (DCFS) to store data objects via one or more application program interface calls to an eventually consistent unstructured object storage service, wherein the unstructured object storage service is implemented on a first hardware computer system accessible over one or more networks and implements a first client-facing interface accessible to a plurality of clients; establishing a DCFS directory to store metadata for the data objects in a multi-tenant database service different from the unstructured object storage service, wherein the multi-tenant database service is implemented on a second hardware computer system accessible over the one or more networks and implements a second client-facing interface accessible to the plurality of clients; responsive to a request to modify a data object of the data objects, modifying the data object in the unstructured object storage service and metadata of the data object in the multi-tenant database service, wherein the unstructured data storage service is not guaranteed to return a latest version of the data object produced by the request via the first client-facing interface, and the multi-tenant database service is guaranteed to return a latest version of the metadata of the data object produced by the request via the second client-facing interface; and accessing the DCFS directory to obtain consistent metadata for accessing the data objects stored in the DCFS. - View Dependent Claims (22, 23, 24)
-
Specification