Pre-fetching data for a distributed filesystem
First Claim
1. A computer-implemented method for pre-fetching data in a distributed filesystem, the method comprising:
- collectively managing data coherency for the data of the distributed filesystem using two or more cloud controllers by;
collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients can only access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in a remote cloud storage system using encrypted cloud files, wherein each cloud controller caches a subset of the file data from the remote cloud storage system that is being actively accessed by that cloud controller'"'"'s respective clients, wherein all new file data received by each cloud controller from its clients is written to the remote cloud storage system via the receiving cloud controller;
maintaining at each cloud controller a copy of the complete metadata for all of the files stored in the distributed filesystem, wherein each cloud controller communicates any changes to the metadata for the distributed filesystem to the full set of cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of each file in the distributed filesystem;
upon receiving in a cloud controller new file data from a client, storing the new file data for the distributed filesystem as cloud files in the remote cloud storage system;
upon receiving confirmation that the new cloud files have been successfully stored in the remote cloud storage system, sending from the cloud controller an incremental metadata snapshot that includes new metadata for the distributed filesystem that describes the new file data and links to the new cloud files, wherein the incremental metadata snapshot is received by the other cloud controllers of the distributed filesystem and used to ensure data coherency for the distributed filesystem;
receiving at the cloud controller a request from a client to access a data block for a file;
traversing the cloud controller'"'"'s copy of the metadata for the distributed filesystem to identify a metadata entry associated with the data block;
using the metadata entry to download a cloud file containing the data block from the remote cloud storage system to the cloud controller;
determining that an additional cloud file in the remote cloud storage system includes data that is likely to be accessed in conjunction with the data block; and
pre-fetching the additional cloud file from the remote cloud storage system to the cloud controller.
9 Assignments
0 Petitions
Accused Products
Abstract
The disclosed embodiments provide a system that facilitates pre-fetching data for a distributed filesystem. During operation, a cloud controller (e.g., a computing device that caches data from the distributed filesystem) that maintains a set of metadata for the distributed filesystem receives a request to access a data block for a file. The cloud controller traverses the metadata to identify a metadata entry that is associated with the block, and then uses this metadata entry to download a cloud file containing the data block from a cloud storage system. While performing these operations, the cloud controller additionally determines that an additional cloud file in the cloud storage system includes data that is likely to be accessed in conjunction with the data block, and proceeds to pre-fetch this additional cloud file from the cloud storage system.
-
Citations
20 Claims
-
1. A computer-implemented method for pre-fetching data in a distributed filesystem, the method comprising:
-
collectively managing data coherency for the data of the distributed filesystem using two or more cloud controllers by; collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients can only access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in a remote cloud storage system using encrypted cloud files, wherein each cloud controller caches a subset of the file data from the remote cloud storage system that is being actively accessed by that cloud controller'"'"'s respective clients, wherein all new file data received by each cloud controller from its clients is written to the remote cloud storage system via the receiving cloud controller; maintaining at each cloud controller a copy of the complete metadata for all of the files stored in the distributed filesystem, wherein each cloud controller communicates any changes to the metadata for the distributed filesystem to the full set of cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of each file in the distributed filesystem; upon receiving in a cloud controller new file data from a client, storing the new file data for the distributed filesystem as cloud files in the remote cloud storage system; upon receiving confirmation that the new cloud files have been successfully stored in the remote cloud storage system, sending from the cloud controller an incremental metadata snapshot that includes new metadata for the distributed filesystem that describes the new file data and links to the new cloud files, wherein the incremental metadata snapshot is received by the other cloud controllers of the distributed filesystem and used to ensure data coherency for the distributed filesystem; receiving at the cloud controller a request from a client to access a data block for a file; traversing the cloud controller'"'"'s copy of the metadata for the distributed filesystem to identify a metadata entry associated with the data block; using the metadata entry to download a cloud file containing the data block from the remote cloud storage system to the cloud controller; determining that an additional cloud file in the remote cloud storage system includes data that is likely to be accessed in conjunction with the data block; and pre-fetching the additional cloud file from the remote cloud storage system to the cloud controller. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for pre-fetching data in a distributed filesystem, the method comprising:
-
collectively managing data coherency for the data of the distributed filesystem using two or more cloud controllers by; collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients can only access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in a remote cloud storage system using encrypted cloud files, wherein each cloud controller caches a subset of the file data from the remote cloud storage system that is being actively accessed by that cloud controller'"'"'s respective clients, wherein all new file data received by each cloud controller from its clients is written to the remote cloud storage system via the receiving cloud controller; maintaining at each cloud controller a copy of the complete metadata for all of the files stored in the distributed filesystem, wherein each cloud controller communicates any changes to the metadata for the distributed filesystem to the full set of cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of each file in the distributed filesystem; upon receiving in a cloud controller new file data from a client, storing the new file data for the distributed filesystem as cloud files in the remote cloud storage system; upon receiving confirmation that the new cloud files have been successfully stored in the remote cloud storage system, sending from the cloud controller an incremental metadata snapshot that includes new metadata for the distributed filesystem that describes the new file data and links to the new cloud files, wherein the incremental metadata snapshot is received by the other cloud controllers of the distributed filesystem and used to ensure data coherency for the distributed filesystem; receiving at the cloud controller a request from a client to access a data block for a file; traversing the cloud controller'"'"'s copy of the metadata for the distributed filesystem to identify a metadata entry associated with the data block; using the metadata entry to download a cloud file containing the data block from the remote cloud storage system to the cloud controller; determining that an additional cloud file in the remote cloud storage system includes data that is likely to be accessed in conjunction with the data block; and pre-fetching the additional cloud file from the remote cloud storage system to the cloud controller. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. A cloud controller that facilitates pre-fetching data in a distributed filesystem, comprising:
-
a processor; a storage mechanism that stores metadata for the distributed filesystem; a receiving mechanism configured to receive a request from a client to access a data block for a file; and a storage management mechanism; wherein two or more cloud controllers are configured to collectively manage data coherency for the data of the distributed filesystem by; collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients can only access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in a remote cloud storage system using encrypted cloud files, wherein each cloud controller caches a subset of the file data from the remote cloud storage system that is being actively accessed by that cloud controller'"'"'s respective clients, wherein all new file data received by each cloud controller from its clients is written to the remote cloud storage system via the receiving cloud controller; maintaining in the storage mechanism of each cloud controller a copy of the complete metadata for all of the files stored in the distributed filesystem, wherein each cloud controller communicates any changes to the metadata for the distributed filesystem to the full set of cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of each file in the distributed filesystem; upon receiving in a given cloud controller new file data from a client, storing the new file data for the distributed filesystem as cloud files in the remote cloud storage system; upon receiving confirmation that the new cloud files have been successfully stored in the remote cloud storage system, sending from the given cloud controller an incremental metadata snapshot that includes new metadata for the distributed filesystem that describes the new file data and links to the new cloud files, wherein the incremental metadata snapshot is received by the other cloud controllers of the distributed filesystem and used to ensure data coherency for the distributed filesystem; wherein the storage management mechanism is configured to; traverse the stored metadata in the storage mechanism to identify a metadata entry associated with the data block; download a cloud file containing the data block from the remote storage system; determine that an additional cloud file in the remote cloud storage system includes data that is likely to be accessed in conjunction with the data block; and pre-fetch the additional cloud file from the remote cloud storage system.
-
Specification