Executing a cloud command for a distributed filesystem

US 9,811,532 B2
Filed: 09/05/2013
Issued: 11/07/2017
Est. Priority Date: 05/03/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for performing a distributed-filesystem-specific action, the method comprising:

collectively managing the data of the distributed filesystem using two or more cloud controllers, wherein collectively managing the data comprises;

storing the data for the distributed filesystem in a cloud storage system, wherein the cloud controllers cache and ensure data consistency for data stored in the cloud storage system;

maintaining in each cloud controller a metadata hierarchy that reflects the current state of the distributed filesystem, wherein the metadata hierarchy is stored in the local storage device, wherein changes to the metadata for the distributed filesystem are synchronized across the cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of the files in the distributed filesystem; and

collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in the cloud storage system, wherein cloud controllers cache in their local storage devices a subset of the file data from the cloud storage system that is being actively accessed by each respective cloud controller'"'"'s clients, wherein new file data received by each cloud controller from its clients is written to the cloud storage system;

presenting a distributed-filesystem-specific capability to a client system as a file in the distributed filesystem using a file abstraction;

receiving at a cloud controller a request to perform a distributed-filesystem-specific action in response to the file access, wherein the request comprises a cloud-aware copy operation and specifies a source file and a destination file in the distributed filesystem, wherein the cloud controller is currently not caching the data blocks for the source file;

using the metadata hierarchy on the cloud controller to create the destination file on the cloud controller, wherein the metadata for the destination file references the same data blocks that are associated with the source file in a set of cloud files that are stored in the cloud storage system, wherein the cloud controller creates the destination file without accessing any of the data blocks or cloud files that are associated with the source file, wherein the cloud controller updates the deduplication reference counts for the data blocks of the source file to account for the newly created destination file; and

distributing a metadata snapshot that includes the metadata for the destination file and the updated deduplication information to the other cloud controllers that collectively manage the distributed filesystem to notify of the creation of the destination file;

wherein no data blocks for the source file need to be transmitted from the cloud storage system or the cloud controller for the cloud-aware copy operation, thereby substantially reducing the network bandwidth and latency associated with copying large files on the cloud controller and substantially reducing the perceived command execution time for the client system.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The disclosed embodiments disclose techniques for executing a cloud command for a distributed filesystem. Two or more cloud controllers collectively manage distributed filesystem data that is stored in one or more cloud storage systems; the cloud controllers ensure data consistency for the stored data, and each cloud controller caches portions of the distributed filesystem. During operation, a cloud controller presents a distributed-filesystem-specific capability to a client system as a file in the distributed filesystem (e.g., using a file abstraction). Upon receiving a request from the client system to access and/or operate upon this file, the client controller executes an associated cloud command. More specifically, the cloud controller initiates a specially-defined operation that accesses additional functionality for the distributed filesystem that exceeds the scope of individual reads and writes to a typical data file.

Citations

18 Claims

1. A computer-implemented method for performing a distributed-filesystem-specific action, the method comprising:
- collectively managing the data of the distributed filesystem using two or more cloud controllers, wherein collectively managing the data comprises;
  
  storing the data for the distributed filesystem in a cloud storage system, wherein the cloud controllers cache and ensure data consistency for data stored in the cloud storage system;
  
  maintaining in each cloud controller a metadata hierarchy that reflects the current state of the distributed filesystem, wherein the metadata hierarchy is stored in the local storage device, wherein changes to the metadata for the distributed filesystem are synchronized across the cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of the files in the distributed filesystem; and
  
  collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in the cloud storage system, wherein cloud controllers cache in their local storage devices a subset of the file data from the cloud storage system that is being actively accessed by each respective cloud controller'"'"'s clients, wherein new file data received by each cloud controller from its clients is written to the cloud storage system;
  
  presenting a distributed-filesystem-specific capability to a client system as a file in the distributed filesystem using a file abstraction;
  
  receiving at a cloud controller a request to perform a distributed-filesystem-specific action in response to the file access, wherein the request comprises a cloud-aware copy operation and specifies a source file and a destination file in the distributed filesystem, wherein the cloud controller is currently not caching the data blocks for the source file;
  
  using the metadata hierarchy on the cloud controller to create the destination file on the cloud controller, wherein the metadata for the destination file references the same data blocks that are associated with the source file in a set of cloud files that are stored in the cloud storage system, wherein the cloud controller creates the destination file without accessing any of the data blocks or cloud files that are associated with the source file, wherein the cloud controller updates the deduplication reference counts for the data blocks of the source file to account for the newly created destination file; and
  
  distributing a metadata snapshot that includes the metadata for the destination file and the updated deduplication information to the other cloud controllers that collectively manage the distributed filesystem to notify of the creation of the destination file;
  
  wherein no data blocks for the source file need to be transmitted from the cloud storage system or the cloud controller for the cloud-aware copy operation, thereby substantially reducing the network bandwidth and latency associated with copying large files on the cloud controller and substantially reducing the perceived command execution time for the client system.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
- - 2. The computer-implemented method of claim 1,wherein the distributed management of the distributed filesystem is typically transparent to the client system;
    - andwherein the distributed-filesystem-specific action initiates a specially-defined operation in a cloud controller that accesses additional functionality for the distributed filesystem that exceeds the scope of individual reads and writes to a file.
  - 3. The computer-implemented method of claim 1, wherein the method further comprises:
    - determining that one or more data blocks for the destination file are not currently being cached by the cloud controller; and
      
      pre-fetching one or more uncached data blocks for the destination file from the cloud storage system.
  - 4. The computer-implemented method of claim 2, wherein the distributed-filesystem-specific action further comprises a user-initiated distributed snapshot operation that comprises:
    - determining from the request a portion of the distributed filesystem to be snapshotted;
      
      wherein the cloud controller initiates a distributed snapshot operation for the portion of the distributed filesystem in every cloud controller associated with the distributed filesystem;
      
      wherein each cloud controller associated with the distributed filesystem subsequently distributes an incremental metadata snapshot to the other cloud controllers for the distributed filesystem to share the resulting snapshot state and create a global snapshot of the entire state of the distributed filesystem.
  - 5. The computer-implemented method of claim 4,wherein the method further comprises exposing snapshot information to the client system using a snapshot directory;
    - wherein a user can access a snapshotted version of a file that was created during the user-initiated snapshot operation via the snapshot directory; and
      
      wherein facilitating user access to snapshotted files facilitates reducing the administrative burden of system administrators for the distributed filesystem.
  - 6. The computer-implemented method of claim 4,wherein the user-initiated snapshot operation further comprises:
    - executing and initiating a virtual machine in the distributed filesystem environment, wherein the virtual machine executes on a client system that accesses the distributed filesystem via the cloud controller and stores virtual machine data and state to the distributed filesystem; and
      
      snapshotting the state and the data of the virtual machine in the distributed filesystem;
      
      wherein the distributed-filesystem-specific action further comprises cloning the snapshotted virtual machine; and
      
      wherein cloning the snapshotted virtual machine facilitates reducing the overhead associated with instantiating a second virtual machine.
  - 7. The computer-implemented method of claim 6, wherein cloning the snapshotted virtual machine further comprises performing a cloud-aware copy operation for the state and the data of the virtual machine.
  - 8. The computer-implemented method of claim 4, wherein the distributed-filesystem-specific action further comprises a database-backup operation that comprises:
    - synchronizing all of the in-memory data for a database application to the distributed filesystem, wherein synchronizing the in-memory data to the distributed filesystem ensures that all of the data for the database application is consistently stored in the distributed filesystem; and
      
      performing the user-initiated snapshot operation for the database application to ensure that all updated data blocks for the database application are propagated to a cloud storage system;
      
      wherein using the distributed filesystem to perform the database-backup operation facilitates backing up the data for the database application while avoiding writing the database data to a separate database dump file.
  - 9. The computer-implemented method of claim 2, wherein the distributed-filesystem-specific action further comprises a cloud-aware archive operation that comprises:
    - determining from the request a file to be archived in an archival cloud storage system; and
      
      transferring one or more cloud files containing data associated with the file from a first cloud storage system to the archival cloud storage system.
  - 10. The computer-implemented method of claim 2, wherein the distributed-filesystem-specific action further comprises a cloud-aware restore operation that comprises:
    - determining from the request a file that has been archived in an archival cloud storage system;
      
      transferring one or more cloud files containing data associated with the file from an archival cloud storage system to a second cloud storage system; and
      
      transferring the data blocks for the file to the cloud controller.
  - 11. The computer-implemented method of claim 2, wherein the method further comprises determining a set of distributed-filesystem-specific capabilities that are presented to the client system based on a set of permissions associated with a user accessing the distributed filesystem via the client system.
  - 12. The computer-implemented method of claim 11, wherein the method further comprises changing the set of cloud commands supported by the cloud controller without modifying the client system or the interface between the client system and the cloud controller.
  - 13. The computer-implemented method of claim 11, wherein the method further comprises:
    - detecting that the file access is associated with a cloud command instead of a normal data file in the distributed filesystem; and
      
      initiating an event handler and a set of program instructions that are associated distributed-filesystem-specific actions.
  - 14. The computer-implemented method of claim 13,wherein the distributed-filesystem-specific action provides status information for a data file in distributed filesystem;
    - wherein the status information for the data file is dynamically generated by the cloud controller; and
      
      wherein the status information comprises;
      
      a timestamp for the most recent snapshot that included the data file;
      
      replication status for the data file;
      
      the percentage of the data file'"'"'s data has been written to a cloud storage system;
      
      the portions of the data file that are currently being cached in the cloud controller;
      
      an estimated time interval needed to retrieve any uncached data blocks for the data file given the load of the cloud controller and an associated network; and
      
      an indication of whether the data file has been archived and, if so, restore information for the data file.

15. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for performing a distributed-filesystem-specific action, the method comprising:
- collectively managing the data of the distributed filesystem using two or more cloud controllers, wherein collectively managing the data comprises;
  
  storing the data for the distributed filesystem in a cloud storage system, wherein the cloud controllers cache and ensure data consistency for data stored in the cloud storage system;
  
  maintaining in each cloud controller a metadata hierarchy that reflects the current state of the distributed filesystem, wherein the metadata hierarchy is stored in the local storage device, wherein changes to the metadata for the distributed filesystem are synchronized across the cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of the files in the distributed filesystem; and
  
  collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in the cloud storage system, wherein cloud controllers cache in their local storage devices a subset of the file data from the cloud storage system that is being actively accessed by each respective cloud controller'"'"'s clients, wherein new file data received by each cloud controller from its clients is written to the cloud storage system;
  
  presenting a distributed-filesystem-specific capability to a client system as a file in the distributed filesystem using a file abstraction;
  
  receiving at a cloud controller a request to perform a distributed-filesystem-specific action in response to the file access, wherein the request comprises a cloud-aware copy operation and specifies a source file and a destination file in the distributed filesystem, wherein the cloud controller is currently not caching the data blocks for the source file;
  
  using the metadata hierarchy on the cloud controller to create the destination file on the cloud controller, wherein the metadata for the destination file references the same data blocks that are associated with the source file in a set of cloud files that are stored in the cloud storage system, wherein the cloud controller creates the destination file without accessing any of the data blocks or cloud files that are associated with the source file, wherein the cloud controller updates the deduplication reference counts for the data blocks of the source file to account for the newly created destination file; and
  
  distributing a metadata snapshot that includes the metadata for the destination file and the updated deduplication information to the other cloud controllers that collectively manage the distributed filesystem to notify of the creation of the destination file;
  
  wherein no data blocks for the source file need to be transmitted from the cloud storage system or the cloud controller for the cloud-aware copy operation, thereby substantially reducing the network bandwidth and latency associated with copying large files on the cloud controller and substantially reducing the perceived command execution time for the client system.
- View Dependent Claims (16, 17)
- - 16. The non-transitory computer-readable storage medium of claim 15, wherein the distributed-filesystem-specific action further comprises a user-initiated snapshot operation that comprises:
    - determining from the request a portion of the distributed filesystem to be snapshotted;
      
      initiating a distributed snapshot operation for the portion of the distributed filesystem in every cloud controller associated with the distributed filesystem;
      
      triggering an incremental metadata snapshot for each cloud controller associated with the distributed filesystem to share the resulting snapshot state between the cloud controllers, thereby snapshotting the entire state of the distributed filesystem at the time of the request.
  - 17. The non-transitory computer-readable storage medium of claim 16,wherein the user-initiated snapshot operation further comprises:
    - executing and initiating a virtual machine in the distributed filesystem environment; and
      
      snapshotting the state and the data of the virtual machine;
      
      wherein the distributed-filesystem-specific action further comprises cloning the snapshotted virtual machine; and
      
      wherein cloning the snapshotted virtual machine facilitates reducing the overhead associated with instantiating a second virtual machine.

18. A cloud controller that performs a distributed-filesystem-specific action, comprising:
- a processor;
  
  a storage mechanism that stores metadata for the distributed filesystem; and
  
  a storage management mechanism;
  
  wherein two or more cloud controllers collectively manage the data of the distributed filesystem, wherein collectively managing the data comprises;
  
  storing the data for the distributed filesystem in a cloud storage system, wherein the cloud controllers cache and ensure data consistency for data stored in the cloud storage system;
  
  maintaining in each cloud controller a metadata hierarchy that reflects the current state of the distributed filesystem, wherein the metadata hierarchy is stored in the local storage device, wherein changes to the metadata for the distributed filesystem are synchronized across the cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of the files in the distributed filesystem; and
  
  collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in the cloud storage system, wherein cloud controllers cache in their local storage devices a subset of the file data from the cloud storage system that is being actively accessed by each respective cloud controller'"'"'s clients, wherein new file data received by each cloud controller from its clients is written to the cloud storage system;
  
  wherein the cloud controller is configured to present a distributed-filesystem-specific capability to a client system as a file in the distributed filesystem using a file abstraction;
  
  wherein the cloud controller is further configured to receive a request to perform a distributed-filesystem-specific action in response to the file access, wherein the request comprises a cloud-aware copy operation and specifies a source file and a destination file in the distributed filesystem, wherein the cloud controller is currently not caching the data blocks for the source file;
  
  wherein the cloud controller is further configured to;
  
  use the metadata hierarchy on the cloud controller to create the destination file on the cloud controller, wherein the metadata for the destination file references the same data blocks that are associated with the source file in a set of cloud files that are stored in the cloud storage system, wherein the cloud controller creates the destination file without accessing any of the data blocks or cloud files that are associated with the source file, wherein the cloud controller updates the deduplication reference counts for the data blocks of the source file to account for the newly created destination file; and
  
  distribute a metadata snapshot that includes the metadata for the destination file and the updated deduplication information to the other cloud controllers that collectively manage the distributed filesystem to notify of the creation of the destination file; and
  
  wherein no data blocks for the source file need to be transmitted from the cloud storage system or the cloud controller for the cloud-aware copy operation, thereby substantially reducing the network bandwidth and latency associated with copying large files on the cloud controller and substantially reducing the perceived command execution time for the client system.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Panzura, Inc.
Original Assignee
Panzura, Inc.
Inventors
Parkison, Brian Christopher, Davis, Andrew P., Taylor, John Richard, Chou, Randy Yen-pang
Primary Examiner(s)
Rahman, S M

Application Number

US14/019,247
Publication Number

US 20140006354A1
Time in Patent Office

1,524 Days
Field of Search

709219
US Class Current
CPC Class Codes

G06F 16/137   Hash-based content-based in...

G06F 16/172   Caching, prefetching or hoa...

G06F 16/1752   based on file chunks

G06F 16/182   Distributed file systems

G06F 16/183   Provision of network file s...

G06F 3/0611   in relation to response time

G06F 3/0635   by changing the path, e.g. ...

G06F 3/065   Replication mechanisms

G06F 3/067   Distributed or networked st...

Executing a cloud command for a distributed filesystem

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

Citations

18 Claims

Specification

Solutions

Use Cases

Quick Links

Executing a cloud command for a distributed filesystem

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

18 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links