Using overlay metadata in a cloud controller to generate incremental snapshots for a distributed filesystem

US 9,824,095 B1
Filed: 02/15/2013
Issued: 11/21/2017
Est. Priority Date: 05/03/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for using a set of overlay metadata in a cloud controller to generate incremental snapshots for a distributed filesystem, the method comprising:

collectively managing the data of the distributed filesystem using two or more cloud controllers, wherein collectively managing the data comprises;

storing the data for the distributed filesystem in a remote cloud storage system, wherein the cloud controllers cache and ensure data consistency for data stored in the remote cloud storage system;

maintaining in each cloud controller a metadata hierarchy that reflects the current state of the distributed filesystem, wherein changes to the metadata for the distributed filesystem are synchronized across the cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of the files in the distributed filesystem; and

collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in the remote cloud storage system, wherein cloud controllers cache a subset of the file data from the remote cloud storage system that is being actively accessed by each respective cloud controller'"'"'s clients, wherein new file data received by each cloud controller from its clients is written to the remote cloud storage system, wherein the metadata hierarchy in the cloud controller tracks the location of distributed filesystem data blocks in the remote cloud storage system and cached distributed filesystem data blocks in the cloud controller, wherein the cloud controller uses the metadata hierarchy to locate and download requested, uncached data blocks in the distributed filesystem from the remote cloud storage system;

receiving in the cloud controller new data for the distributed filesystem from a client, wherein receiving the new data comprises;

caching the new data in the cloud controller;

creating a metadata entry for the new data in the metadata hierarchy maintained by the cloud controller for the distributed filesystem; and

updating the overlay metadata to point to the metadata entry and the new data stored in the cloud controller, wherein the overlay metadata identifies the set of new data that has been received by the cloud controller but not yet written to the cloud storage systems;

traversing the overlay metadata to determine the set of new data that will be used to generate an incremental snapshot for the distributed filesystem,wherein the incremental snapshot comprises one or more fixed-size cloud files that contain data blocks for one or more files of the distributed filesystem, wherein cloud files serve as containers that are distinct from distributed filesystem files, wherein one cloud file can store portions or all of one or more distributed filesystem files in the remote cloud storage system, wherein the incremental snapshot is stored to the remote cloud storage system and distributed to one or more other cloud controllers for the distributed filesystem to ensure that the distributed filesystem is consistent and that the cloud controllers can access the new data received from the client; and

upon determining from the overlay metadata that the new data to be distributed via the incremental snapshot will span multiple cloud files, grouping the new data into multiple new cloud files in a manner that optimizes splitting the new data blocks for each given updated distributed filesystem file across multiple different cloud files, wherein grouping the complete set of data for each updated distributed filesystem file into a single cloud file where possible and as few cloud files as possible facilitates reducing future access overhead associated with having to download multiple cloud files from the remote cloud storage system to cache a given distributed filesystem file;

wherein grouping the new data into multiple new cloud files further comprises grouping the new data blocks for two or more distributed filesystem files across the multiple cloud files based on anticipated file access patterns and file types;

wherein optimizing the grouping of multiple associated distributed filesystem files into a single cloud file reduces future access overhead for distributed filesystem files that are likely to be accessed together; and

wherein accessing multiple related distributed filesystem files via a single cloud file reduces the network bandwidth and network latency involved in retrieving the distributed filesystem files from the remote cloud storage system.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The disclosed embodiments provide a system that uses overlay metadata in a cloud controller to generate incremental snapshots for a distributed filesystem. Two or more cloud controllers collectively manage distributed filesystem data that is stored in one or more cloud storage systems. More specifically, the cloud controllers cache and ensure data consistency for the data stored in the cloud storage systems, with each cloud controller maintaining a metadata hierarchy that reflects the current state of the distributed filesystem. During operation, a cloud controller receiving new data from a client: (1) stores the new data in the cloud controller; (2) creates a metadata entry for the new data in the locally maintained metadata hierarchy; (3) updates the overlay metadata to point to the metadata entry and the new data stored in the cloud controller; and (4) then uses the overlay metadata to generate an incremental snapshot for the new data.

62 Citations

View as Search Results

19 Claims

1. A computer-implemented method for using a set of overlay metadata in a cloud controller to generate incremental snapshots for a distributed filesystem, the method comprising:
- collectively managing the data of the distributed filesystem using two or more cloud controllers, wherein collectively managing the data comprises;
  
  storing the data for the distributed filesystem in a remote cloud storage system, wherein the cloud controllers cache and ensure data consistency for data stored in the remote cloud storage system;
  
  maintaining in each cloud controller a metadata hierarchy that reflects the current state of the distributed filesystem, wherein changes to the metadata for the distributed filesystem are synchronized across the cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of the files in the distributed filesystem; and
  
  collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in the remote cloud storage system, wherein cloud controllers cache a subset of the file data from the remote cloud storage system that is being actively accessed by each respective cloud controller'"'"'s clients, wherein new file data received by each cloud controller from its clients is written to the remote cloud storage system, wherein the metadata hierarchy in the cloud controller tracks the location of distributed filesystem data blocks in the remote cloud storage system and cached distributed filesystem data blocks in the cloud controller, wherein the cloud controller uses the metadata hierarchy to locate and download requested, uncached data blocks in the distributed filesystem from the remote cloud storage system;
  
  receiving in the cloud controller new data for the distributed filesystem from a client, wherein receiving the new data comprises;
  
  caching the new data in the cloud controller;
  
  creating a metadata entry for the new data in the metadata hierarchy maintained by the cloud controller for the distributed filesystem; and
  
  updating the overlay metadata to point to the metadata entry and the new data stored in the cloud controller, wherein the overlay metadata identifies the set of new data that has been received by the cloud controller but not yet written to the cloud storage systems;
  
  traversing the overlay metadata to determine the set of new data that will be used to generate an incremental snapshot for the distributed filesystem,wherein the incremental snapshot comprises one or more fixed-size cloud files that contain data blocks for one or more files of the distributed filesystem, wherein cloud files serve as containers that are distinct from distributed filesystem files, wherein one cloud file can store portions or all of one or more distributed filesystem files in the remote cloud storage system, wherein the incremental snapshot is stored to the remote cloud storage system and distributed to one or more other cloud controllers for the distributed filesystem to ensure that the distributed filesystem is consistent and that the cloud controllers can access the new data received from the client; and
  
  upon determining from the overlay metadata that the new data to be distributed via the incremental snapshot will span multiple cloud files, grouping the new data into multiple new cloud files in a manner that optimizes splitting the new data blocks for each given updated distributed filesystem file across multiple different cloud files, wherein grouping the complete set of data for each updated distributed filesystem file into a single cloud file where possible and as few cloud files as possible facilitates reducing future access overhead associated with having to download multiple cloud files from the remote cloud storage system to cache a given distributed filesystem file;
  
  wherein grouping the new data into multiple new cloud files further comprises grouping the new data blocks for two or more distributed filesystem files across the multiple cloud files based on anticipated file access patterns and file types;
  
  wherein optimizing the grouping of multiple associated distributed filesystem files into a single cloud file reduces future access overhead for distributed filesystem files that are likely to be accessed together; and
  
  wherein accessing multiple related distributed filesystem files via a single cloud file reduces the network bandwidth and network latency involved in retrieving the distributed filesystem files from the remote cloud storage system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The computer-implemented method of claim 1, wherein using the overlay metadata to generate the incremental snapshot facilitates creating cloud files in place without allocating additional memory buffers or performing additional copy operations.
  - 3. The computer-implemented method of claim 2, wherein the overlay metadata comprises:
    - a first set of pointers that reference a set of new metadata entries created subsequent to a preceding incremental metadata snapshot; and
      
      a second set of pointers that reference a set of new data stored subsequent to a preceding incremental data snapshot.
  - 4. The computer-implemented method of claim 3, wherein using the overlay metadata to generate the incremental snapshot further comprises:
    - traversing the first set of pointers to generate a first cloud file containing an incremental metadata snapshot for the new data;
      
      traversing the second set of pointers to generate a second cloud file containing an incremental data snapshot for the new data;
      
      wherein the incremental metadata snapshot is received by the other cloud controllers of the distributed filesystem and used to update the other cloud controllers'"'"' metadata hierarchies to reflect the new data;
      
      wherein the second cloud file is stored in a cloud storage system; and
      
      wherein the other cloud controllers can use metadata in the incremental metadata snapshot to access the new data in the second cloud file via the cloud storage system.
  - 5. The computer-implemented method of claim 4, wherein updating the overlay metadata in the cloud controller comprises updating the overlay metadata at the time that the cloud controller receives new data from the client.
  - 6. The computer-implemented method of claim 4, wherein updating the overlay metadata in the cloud controller comprises traversing the metadata hierarchy at a specified snapshot time interval, updating the overlay metadata to include metadata entries that have been created and new data that has been written in a specified snapshot interval, and ensuring that all of the new metadata and data for the interval is written to the cloud storage systems and the other cloud controllers for the distributed filesystem, thereby maintaining consistency across the distributed filesystem.
  - 7. The computer-implemented method of claim 4, wherein generating the incremental snapshot for the new data involves:
    - using the overlay metadata to present new data as a virtual cloud file that presents a view of a disparate set of new data blocks as a single file that can be read;
      
      reading and encrypting the virtual cloud file to create a container file that comprises new data blocks for two or more distinct files in the distributed filesystem; and
      
      transferring the encrypted virtual cloud file to the cloud storage system as the second cloud file.
  - 8. The computer-implemented method of claim 7, wherein the method further comprises:
    - resetting the overlay metadata after generating the incremental metadata snapshot and the incremental data snapshot; and
      
      updating the overlay metadata to track subsequent new data and associated metadata entries for subsequent incremental snapshots.
  - 9. The computer-implemented method of claim 4,wherein the cloud controller is configured to simultaneously generate separate cloud files to logically group different classes and types of new data;
    - andwherein using the overlay metadata to generate the incremental data snapshot comprises using one or more additional set of pointers to track the different classes and types of new data and generate separate cloud files.

10. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for using a set of overlay metadata in a cloud controller to generate incremental snapshots for a distributed filesystem, the method comprising:
- collectively managing the data of the distributed filesystem using two or more cloud controllers, wherein collectively managing the data comprises;
  
  storing the data for the distributed filesystem in a remote cloud storage system, wherein the cloud controllers cache and ensure data consistency for data stored in the remote cloud storage system;
  
  maintaining in each cloud controller a metadata hierarchy that reflects the current state of the distributed filesystem, wherein changes to the metadata for the distributed filesystem are synchronized across the cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of the files in the distributed filesystem; and
  
  collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in the remote cloud storage system, wherein cloud controllers cache a subset of the file data from the remote cloud storage system that is being actively accessed by each respective cloud controller'"'"'s clients, wherein new file data received by each cloud controller from its clients is written to the remote cloud storage system, wherein the metadata hierarchy in the cloud controller tracks the location of distributed filesystem data blocks in the remote cloud storage system and cached distributed filesystem data blocks in the cloud controller, wherein the cloud controller uses the metadata hierarchy to locate and download requested, uncached data blocks in the distributed filesystem from the remote cloud storage system;
  
  receiving in the cloud controller new data for the distributed filesystem from a client, wherein receiving the new data comprises;
  
  caching the new data in the cloud controller;
  
  creating a metadata entry for the new data in the metadata hierarchy maintained by the cloud controller for the distributed filesystem; and
  
  updating the overlay metadata to point to the metadata entry and the new data stored in the cloud controller, wherein the overlay metadata identifies the set of new data that has been received by the cloud controller but not yet written to the cloud storage systems; and
  
  traversing the overlay metadata to determine the set of new data that will be used to generate an incremental snapshot for the distributed filesystem, wherein the incremental snapshot comprises one or more fixed-size cloud files that contain data blocks for one or more files of the distributed filesystem, wherein cloud files serve as containers that are distinct from distributed filesystem files, wherein one cloud file can store portions or all of one or more distributed filesystem files in the remote cloud storage system, wherein the incremental snapshot is stored to the remote cloud storage system and distributed to one or more other cloud controllers for the distributed filesystem to ensure that the distributed filesystem is consistent and that the cloud controllers can access the new data received from the client; and
  
  upon determining from the overlay metadata that the new data to be distributed via the incremental snapshot will span multiple cloud files, grouping the new data into multiple new cloud files in a manner that optimizes splitting the new data blocks for each given updated distributed filesystem file across multiple different cloud files, wherein grouping the complete set of data for each updated distributed filesystem file into a single cloud file where possible and as few cloud files as possible facilitates reducing future access overhead associated with having to download multiple cloud files from the remote cloud storage system to cache a given distributed filesystem file;
  
  wherein grouping the new data into multiple new cloud files further comprises grouping the new data blocks for two or more distributed filesystem files across the multiple cloud files based on anticipated file access patterns and file types;
  
  wherein optimizing the grouping of multiple associated distributed filesystem files into a single cloud file reduces future access overhead for distributed filesystem files that are likely to be accessed together; and
  
  wherein accessing multiple related distributed filesystem files via a single cloud file reduces the network bandwidth and network latency involved in retrieving the distributed filesystem files from the remote cloud storage system.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The non-transitory computer-readable storage medium of claim 10, wherein using the overlay metadata to generate the incremental snapshot facilitates creating cloud files in place without allocating additional memory buffers or performing additional copy operations.
  - 12. The non-transitory computer-readable storage medium of claim 11, wherein the overlay metadata comprises:
    - a first set of pointers that reference a set of new metadata entries created subsequent to a preceding incremental metadata snapshot; and
      
      a second set of pointers that reference a set of new data stored subsequent to a preceding incremental data snapshot.
  - 13. The non-transitory computer-readable storage medium of claim 12, wherein using the overlay metadata to generate the incremental snapshot further comprises:
    - traversing the first set of pointers to generate a first cloud file containing an incremental metadata snapshot for the new data;
      
      traversing the second set of pointers to generate a second cloud file containing an incremental data snapshot for the new data;
      
      wherein the incremental metadata snapshot is received by the other cloud controllers of the distributed filesystem and used to update the other cloud controllers'"'"' metadata hierarchies to reflect the new data;
      
      wherein the second cloud file is stored in a cloud storage system; and
      
      wherein the other cloud controllers can use metadata in the incremental metadata snapshot to access the new data in the second cloud file via the cloud storage system.
  - 14. The non-transitory computer-readable storage medium of claim 13, wherein updating the overlay metadata in the cloud controller comprises updating the overlay metadata at the time that the cloud controller receives new data from the client.
  - 15. The non-transitory computer-readable storage medium of claim 13, wherein updating the overlay metadata in the cloud controller comprises traversing the metadata hierarchy at a specified snapshot timeframe and updating the overlay metadata to include metadata entries that have been created and new data that has been written in a specified snapshot interval.
  - 16. The non-transitory computer-readable storage medium of claim 13, wherein writing a cloud file to the cloud storage system involves:
    - using the overlay metadata to present new data as a virtual cloud file that presents a view of a disparate set of new data blocks as a single file that can be read;
      
      reading and encrypting the virtual cloud file; and
      
      transferring the encrypted virtual cloud file to the cloud storage system as the second cloud file.
  - 17. The non-transitory computer-readable storage medium of claim 16, wherein the method further comprises:
    - resetting the overlay metadata after generating the incremental metadata snapshot and the incremental data snapshot; and
      
      updating the overlay metadata to track subsequent new data and associated metadata entries for subsequent incremental snapshots.
  - 18. The non-transitory computer-readable storage medium of claim 13,wherein the cloud controller is configured to simultaneously generate separate cloud files to logically group different classes and types of new data;
    - andwherein using the overlay metadata to generate the incremental data snapshot comprises using one or more additional set of pointers to track the different classes and types of new data and generate separate cloud files.

19. A cloud controller that uses a set of overlay metadata to generate incremental snapshots for a distributed filesystem, comprising:
- a processor;
  
  a storage mechanism that stores metadata for the distributed filesystem; and
  
  a storage management mechanism;
  
  wherein two or more cloud controllers collectively manage the data of the distributed filesystem, wherein collectively managing the data comprises;
  
  storing the data for the distributed filesystem in a remote cloud storage system, wherein the storage management mechanisms of the cloud controllers are configured to cache and ensure data consistency for data stored in the remote cloud storage system;
  
  maintaining in the storage management mechanism of each cloud controller a metadata hierarchy that reflects the current state of the distributed filesystem, wherein changes to the metadata for the distributed filesystem are synchronized across the cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of the files in the distributed filesystem; and
  
  collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in the remote cloud storage system, wherein cloud controllers cache a subset of the file data from the remote cloud storage system that is being actively accessed by each respective cloud controller'"'"'s clients, wherein new file data received by each cloud controller from its clients is written to the remote cloud storage system, wherein the metadata hierarchy in the cloud controller tracks the location of distributed filesystem data blocks in the remote cloud storage system and cached distributed filesystem data blocks in the cloud controller, wherein the cloud controller uses the metadata hierarchy to locate and download requested, uncached data blocks in the distributed filesystem from the remote cloud storage system;
  
  wherein the storage management mechanism is further configured to;
  
  cache new data received from a client;
  
  create a metadata entry for the new data in the metadata hierarchy maintained by the cloud controller for the distributed filesystem;
  
  update the overlay metadata to point to the metadata entry and the new data stored in the cloud controller, wherein the overlay metadata identifies the set of new data that has been received by the cloud controller but not yet written to the cloud storage systems;
  
  traverse the overlay metadata to determine the set of new data that will be used to generate an incremental snapshot for the distributed filesystem, wherein the incremental snapshot comprises one or more fixed-size cloud files that contain data blocks for one or more files of the distributed filesystem, wherein cloud files serve as containers that are distinct from distributed filesystem files, wherein one cloud file can store portions or all of one or more distributed filesystem files in the remote cloud storage system, wherein the incremental snapshot is stored to the remote cloud storage system and distributed to one or more other cloud controllers for the distributed filesystem to ensure that the distributed filesystem is consistent and that the cloud controllers can access the new data received from the client; and
  
  upon determining from the overlay metadata that the new data to be distributed via the incremental snapshot will span multiple cloud files, group the new data into multiple new cloud files in a manner that optimizes splitting the new data blocks for each given updated distributed filesystem file across multiple different cloud files, wherein grouping the complete set of data for each updated distributed filesystem file into a single cloud file where possible and as few cloud files as possible facilitates reducing future access overhead associated with having to download multiple cloud files from the remote cloud storage system to cache a given distributed filesystem file;
  
  wherein grouping the new data into multiple new cloud files further comprises grouping the new data blocks for two or more distributed filesystem files across the multiple cloud files based on anticipated file access patterns and file types;
  
  wherein optimizing the grouping of multiple associated distributed filesystem files into a single cloud file reduces future access overhead for distributed filesystem files that are likely to be accessed together; and
  
  wherein accessing multiple related distributed filesystem files via a single cloud file reduces the network bandwidth and network latency involved in retrieving the distributed filesystem files from the remote cloud storage system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Panzura, Inc.
Original Assignee
Panzura, Inc.
Inventors
Taylor, John Richard, Chou, Randy Yen-pang, Davis, Andrew P.
Primary Examiner(s)
Rahman, S M

Application Number

US13/769,206
Time in Patent Office

1,740 Days
Field of Search

709219
US Class Current
CPC Class Codes

G06F 11/1451   by selection of backup cont...

G06F 11/1458   Management of the backup or...

G06F 16/11   File system administration,...

G06F 16/128   Details of file system snap...

G06F 16/1844   Management specifically ada...

G06F 2201/82   Solving problems relating t...

G06F 2201/84   Using snapshots, i.e. a log...

Using overlay metadata in a cloud controller to generate incremental snapshots for a distributed filesystem

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

62 Citations

19 Claims

Specification

Use Cases

Quick Links

Others

Using overlay metadata in a cloud controller to generate incremental snapshots for a distributed filesystem

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

62 Citations

19 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others