×

Using overlay metadata in a cloud controller to generate incremental snapshots for a distributed filesystem

  • US 9,824,095 B1
  • Filed: 02/15/2013
  • Issued: 11/21/2017
  • Est. Priority Date: 05/03/2010
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for using a set of overlay metadata in a cloud controller to generate incremental snapshots for a distributed filesystem, the method comprising:

  • collectively managing the data of the distributed filesystem using two or more cloud controllers, wherein collectively managing the data comprises;

    storing the data for the distributed filesystem in a remote cloud storage system, wherein the cloud controllers cache and ensure data consistency for data stored in the remote cloud storage system;

    maintaining in each cloud controller a metadata hierarchy that reflects the current state of the distributed filesystem, wherein changes to the metadata for the distributed filesystem are synchronized across the cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of the files in the distributed filesystem; and

    collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in the remote cloud storage system, wherein cloud controllers cache a subset of the file data from the remote cloud storage system that is being actively accessed by each respective cloud controller'"'"'s clients, wherein new file data received by each cloud controller from its clients is written to the remote cloud storage system, wherein the metadata hierarchy in the cloud controller tracks the location of distributed filesystem data blocks in the remote cloud storage system and cached distributed filesystem data blocks in the cloud controller, wherein the cloud controller uses the metadata hierarchy to locate and download requested, uncached data blocks in the distributed filesystem from the remote cloud storage system;

    receiving in the cloud controller new data for the distributed filesystem from a client, wherein receiving the new data comprises;

    caching the new data in the cloud controller;

    creating a metadata entry for the new data in the metadata hierarchy maintained by the cloud controller for the distributed filesystem; and

    updating the overlay metadata to point to the metadata entry and the new data stored in the cloud controller, wherein the overlay metadata identifies the set of new data that has been received by the cloud controller but not yet written to the cloud storage systems;

    traversing the overlay metadata to determine the set of new data that will be used to generate an incremental snapshot for the distributed filesystem,wherein the incremental snapshot comprises one or more fixed-size cloud files that contain data blocks for one or more files of the distributed filesystem, wherein cloud files serve as containers that are distinct from distributed filesystem files, wherein one cloud file can store portions or all of one or more distributed filesystem files in the remote cloud storage system, wherein the incremental snapshot is stored to the remote cloud storage system and distributed to one or more other cloud controllers for the distributed filesystem to ensure that the distributed filesystem is consistent and that the cloud controllers can access the new data received from the client; and

    upon determining from the overlay metadata that the new data to be distributed via the incremental snapshot will span multiple cloud files, grouping the new data into multiple new cloud files in a manner that optimizes splitting the new data blocks for each given updated distributed filesystem file across multiple different cloud files, wherein grouping the complete set of data for each updated distributed filesystem file into a single cloud file where possible and as few cloud files as possible facilitates reducing future access overhead associated with having to download multiple cloud files from the remote cloud storage system to cache a given distributed filesystem file;

    wherein grouping the new data into multiple new cloud files further comprises grouping the new data blocks for two or more distributed filesystem files across the multiple cloud files based on anticipated file access patterns and file types;

    wherein optimizing the grouping of multiple associated distributed filesystem files into a single cloud file reduces future access overhead for distributed filesystem files that are likely to be accessed together; and

    wherein accessing multiple related distributed filesystem files via a single cloud file reduces the network bandwidth and network latency involved in retrieving the distributed filesystem files from the remote cloud storage system.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×