Managing metadata and data storage for a cloud controller in a distributed filesystem
First Claim
1. A computer-implemented method for managing metadata and data storage for a cloud controller in a distributed filesystem, the method comprising:
- collectively managing the data of the distributed filesystem using two or more cloud controllers, wherein collectively managing the data comprises;
storing the data for the distributed filesystem in a remote cloud storage system, wherein the cloud controllers cache and ensure data consistency for data stored in the remote cloud storage system, wherein the cloud controller includes a local storage device, wherein the local storage device comprises a rotating disk drive that comprises one or more disk platters;
maintaining in each cloud controller a metadata hierarchy that reflects the current state of the distributed filesystem, wherein the metadata hierarchy is stored in the local storage device, wherein changes to the metadata for the distributed filesystem are synchronized across the cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of the files in the distributed filesystem; and
collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in the remote cloud storage system, wherein cloud controllers cache in their local storage devices a subset of the file data from the remote cloud storage system that is being actively accessed by each respective cloud controller'"'"'s clients, wherein new file data received by each cloud controller from its clients is written to the remote cloud storage system, wherein the metadata hierarchy in the cloud controller tracks the location of distributed filesystem data blocks in the remote cloud storage system and cached distributed filesystem data blocks in the cloud controller, wherein the cloud controller uses the metadata hierarchy to locate and download requested, uncached data blocks in the distributed filesystem from the remote cloud storage system;
defining in a disk platter of the rotating disk drive two or more metadata regions in which the cloud controller stores metadata for the distributed filesystem, wherein the metadata regions are distinct from two or more allocated data regions that are defined in the disk platter that cache distributed filesystem data, wherein different regions of the disk platter in the local storage device have different levels of performance, wherein a metadata region is defined in an outer region of the disk platter that supports the highest access bandwidth and lower access latency;
receiving an incremental metadata snapshot that references new data written to the distributed filesystem;
storing a new metadata entry for the distributed filesystem from the incremental metadata snapshot in the metadata region on the disk platter; and
upon receiving a client request to access a new data block referenced in the incremental metadata snapshot, selecting a data region that is in near proximity to the metadata region and caching the new data block in that data region to ensure that the new metadata entry and the new data block are in relative proximity on the disk platter, thereby ensuring that associated metadata and data can be read without substantially degrading access performance, wherein the data region is distinct from the metadata region;
wherein the cloud controller predicts that the new metadata entry and the new data block are likely to be accessed frequently, wherein the cloud controller selects the metadata region and the data region for the new metadata entry and the new data block respectively because they are on an outer region of the disk platter and hence more favorable for frequent accesses, wherein outer regions of the disk platter have higher spatial density and hence higher effective data bandwidth that improves access rates for frequently accessed data stored in such regions.
9 Assignments
0 Petitions
Accused Products
Abstract
The disclosed embodiments disclose techniques for managing metadata and data storage for a cloud controller in a distributed filesystem. Two or more cloud controllers collectively manage distributed filesystem data that is stored in one or more cloud storage systems. More specifically, the cloud controllers cache and ensure data consistency for the data stored in the cloud storage systems, with each cloud controller maintaining (e.g., storing) in a local storage device: (1) one or more metadata regions containing a metadata hierarchy that reflects the current state of the distributed filesystem; and (2) cached data for the distributed filesystem. During operation, the cloud controller receives an incremental metadata snapshot that references new data written to the distributed filesystem. The cloud controller stores updated metadata from this incremental metadata snapshot in one of the metadata regions on the local storage device.
53 Citations
18 Claims
-
1. A computer-implemented method for managing metadata and data storage for a cloud controller in a distributed filesystem, the method comprising:
-
collectively managing the data of the distributed filesystem using two or more cloud controllers, wherein collectively managing the data comprises; storing the data for the distributed filesystem in a remote cloud storage system, wherein the cloud controllers cache and ensure data consistency for data stored in the remote cloud storage system, wherein the cloud controller includes a local storage device, wherein the local storage device comprises a rotating disk drive that comprises one or more disk platters; maintaining in each cloud controller a metadata hierarchy that reflects the current state of the distributed filesystem, wherein the metadata hierarchy is stored in the local storage device, wherein changes to the metadata for the distributed filesystem are synchronized across the cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of the files in the distributed filesystem; and collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in the remote cloud storage system, wherein cloud controllers cache in their local storage devices a subset of the file data from the remote cloud storage system that is being actively accessed by each respective cloud controller'"'"'s clients, wherein new file data received by each cloud controller from its clients is written to the remote cloud storage system, wherein the metadata hierarchy in the cloud controller tracks the location of distributed filesystem data blocks in the remote cloud storage system and cached distributed filesystem data blocks in the cloud controller, wherein the cloud controller uses the metadata hierarchy to locate and download requested, uncached data blocks in the distributed filesystem from the remote cloud storage system; defining in a disk platter of the rotating disk drive two or more metadata regions in which the cloud controller stores metadata for the distributed filesystem, wherein the metadata regions are distinct from two or more allocated data regions that are defined in the disk platter that cache distributed filesystem data, wherein different regions of the disk platter in the local storage device have different levels of performance, wherein a metadata region is defined in an outer region of the disk platter that supports the highest access bandwidth and lower access latency; receiving an incremental metadata snapshot that references new data written to the distributed filesystem; storing a new metadata entry for the distributed filesystem from the incremental metadata snapshot in the metadata region on the disk platter; and upon receiving a client request to access a new data block referenced in the incremental metadata snapshot, selecting a data region that is in near proximity to the metadata region and caching the new data block in that data region to ensure that the new metadata entry and the new data block are in relative proximity on the disk platter, thereby ensuring that associated metadata and data can be read without substantially degrading access performance, wherein the data region is distinct from the metadata region; wherein the cloud controller predicts that the new metadata entry and the new data block are likely to be accessed frequently, wherein the cloud controller selects the metadata region and the data region for the new metadata entry and the new data block respectively because they are on an outer region of the disk platter and hence more favorable for frequent accesses, wherein outer regions of the disk platter have higher spatial density and hence higher effective data bandwidth that improves access rates for frequently accessed data stored in such regions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for managing metadata and data storage for a cloud controller in a distributed filesystem, the method comprising:
-
collectively managing the data of the distributed filesystem using two or more cloud controllers, wherein collectively managing the data comprises; storing the data for the distributed filesystem in a remote cloud storage system, wherein the cloud controllers cache and ensure data consistency for data stored in the remote cloud storage system, wherein the cloud controller includes a local storage device, wherein the local storage device comprises a rotating disk drive that comprises one or more disk platters; maintaining in each cloud controller a metadata hierarchy that reflects the current state of the distributed filesystem, wherein the metadata hierarchy is stored in the local storage device, wherein changes to the metadata for the distributed filesystem are synchronized across the cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of the files in the distributed filesystem; and collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in the remote cloud storage system, wherein cloud controllers cache in their local storage devices a subset of the file data from the remote cloud storage system that is being actively accessed by each respective cloud controller'"'"'s clients, wherein new file data received by each cloud controller from its clients is written to the remote cloud storage system, wherein the metadata hierarchy in the cloud controller tracks the location of distributed filesystem data blocks in the remote cloud storage system and cached distributed filesystem data blocks in the cloud controller, wherein the cloud controller uses the metadata hierarchy to locate and download requested, uncached data blocks in the distributed filesystem from the remote cloud storage system; defining in a disk platter of the rotating disk drive two or more metadata regions in which the cloud controller stores metadata for the distributed filesystem, wherein the metadata regions are distinct from two or more allocated data regions that are defined in the disk platter that cache distributed filesystem data, wherein different regions of the disk platter in the local storage device have different levels of performance, wherein a metadata region is defined in an outer region of the disk platter that supports the highest access bandwidth and lowest access latency; receiving an incremental metadata snapshot that references new data written to the distributed filesystem; storing a new metadata entry for the distributed filesystem from the incremental metadata snapshot in the metadata region on the disk platter; and upon receiving a client request to access a new data block referenced in the incremental metadata snapshot, selecting a data region that is in near proximity to the metadata region and caching the new data block in that data region to ensure that the new metadata entry and the new data block are in relative proximity on the disk platter, thereby ensuring that associated metadata and data can be read without substantially degrading access performance, wherein the data region is distinct from the metadata region; wherein the cloud controller predicts that the new metadata entry and the new data block are likely to be accessed frequently, wherein the cloud controller selects the metadata region and the data region for the new metadata entry and the new data block respectively because they are on an outer region of the disk platter and hence more favorable for frequent accesses, wherein outer regions of the disk platter have higher spatial density and hence higher effective data bandwidth that improves access rates for frequently accessed data stored in such regions. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. A cloud controller manages metadata and data storage for a distributed filesystem, comprising:
-
a processor; a storage mechanism that stores metadata for the distributed filesystem, wherein the storage mechanism comprises a rotating disk drive that comprises one or more disk platters; and a storage management mechanism; wherein two or more cloud controllers collectively manage the data of the distributed filesystem, wherein collectively managing the data comprises; storing the data for the distributed filesystem in a remote cloud storage system, wherein the cloud controllers cache and ensure data consistency for data stored in the remote cloud storage system, wherein each cloud controller includes a local storage device; maintaining in each cloud controller a metadata hierarchy that reflects the current state of the distributed filesystem, wherein the metadata hierarchy is stored in the local storage device, wherein changes to the metadata for the distributed filesystem are synchronized across the cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of the files in the distributed filesystem; and collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in the remote cloud storage system, wherein cloud controllers cache in their local storage devices a subset of the file data from the remote cloud storage system that is being actively accessed by each respective cloud controller'"'"'s clients, wherein new file data received by each cloud controller from its clients is written to the remote cloud storage system, wherein the metadata hierarchy in the cloud controller tracks the location of distributed filesystem data blocks in the remote cloud storage system and cached distributed filesystem data blocks in the cloud controller, wherein the cloud controller uses the metadata hierarchy to locate and download requested, uncached data blocks in the distributed filesystem from the remote cloud storage system; and wherein the storage management mechanism is further configured to; define in a disk platter of the rotating disk drive two or more metadata regions in which the cloud controller stores metadata for the distributed filesystem, wherein the metadata regions are distinct from two or more allocated data regions that are defined in the disk platter that cache distributed filesystem data, wherein different regions of the disk platter in the local storage device have different levels of performance, wherein a metadata region is defined in an outer region of the disk platter that supports the highest access bandwidth and lowest access latency; receive an incremental metadata snapshot that references new data written to the distributed filesystem; store a new metadata entry for the distributed filesystem from the incremental metadata snapshot in the metadata region on the disk platter; and upon receiving a client request to access a new data block referenced in the incremental metadata snapshot, select a data region that is in near proximity to the metadata region and cache the new data block in that data region to ensure that the new metadata entry and the new data block are in relative proximity on the disk platter, thereby ensuring that associated metadata and data can be read without substantially degrading access performance, wherein the data region is distinct from the metadata region; wherein the storage mechanism predicts that the new metadata entry and the new data block are likely to be accessed frequently, wherein the storage mechanism selects the metadata region and the data region for the new metadata entry and the new data block respectively because they are on an outer region of the disk platter and hence more favorable for frequent accesses, wherein outer regions of the disk platter have higher spatial density and hence higher effective data bandwidth that improves access rates for frequently accessed data stored in such regions.
-
Specification