×

Transferring and caching a cloud file in a distributed filesystem

  • US 9,852,149 B1
  • Filed: 02/15/2013
  • Issued: 12/26/2017
  • Est. Priority Date: 05/03/2010
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for transferring and caching a cloud file in a distributed filesystem, the method comprising:

  • collectively managing the data of the distributed filesystem using two or more cloud controllers, wherein the cloud controllers cache and ensure data consistency for data stored in a cloud storage system, wherein each cloud controller maintains a metadata hierarchy that reflects the current state of the distributed filesystem, wherein changes to the metadata for the distributed filesystem are communicated to the set of cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of the files in the distributed filesystem, wherein the cloud storage system stores groups of multiple data blocks for the distributed filesystem in cloud files, wherein a cloud file comprises a set of data blocks from multiple distinct distributed filesystem files and a metadata index that describes the set of data blocks stored in the cloud file;

    receiving at a cloud controller a request from a client for a data block of a target file in the distributed filesystem, wherein the requested data block is not currently cached in the cloud controller;

    initiating a transfer for a cloud file containing the requested data block from the cloud storage system to the cloud controller, wherein the metadata hierarchy facilitates identifying which cloud file contains the requested data block of the target file but the metadata hierarchy does not include a reverse mapping that facilitates quickly determining the location of other data blocks in the cloud file in the metadata hierarchy of the distributed filesystem;

    while the cloud file has not yet completely been downloaded from the cloud storage system to the cloud controller, already on the cloud controller extracting the metadata index for the cloud file from an initial portion of the cloud file that has already been downloaded to the cloud controller; and

    while portions of the cloud file are still being downloaded to the cloud controller;

    using the metadata index and the metadata hierarchy to determine whether other data blocks in the cloud file are likely to be accessed in a substantially similar timeframe as the requested data block; and

    downloading from the cloud storage system to the cloud controller a limited subset of data blocks from the cloud file that include (1) the requested data block;

    (2) the other data blocks from the cloud file that have been determined to be likely to be accessed; and

    (3) any blocks of the cloud file that are needed to decrypt the portions of the cloud file containing (1) and (2), wherein not downloading and caching the entire cloud file reduces bandwidth usage and improves cache access performance for the distributed filesystem.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×