Providing disaster recovery for a distributed filesystem

US 8,805,967 B2
Filed: 12/21/2012
Issued: 08/12/2014
Est. Priority Date: 05/03/2010
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for providing disaster recovery for a distributed filesystem, the method comprising:

collectively managing the data of the distributed filesystem using two or more cloud controllers, wherein collectively managing the data comprises;

collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients can only access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in a remote cloud storage system using fixed-size cloud files, wherein each cloud controller caches a subset of the file data from the remote cloud storage system that is being actively accessed by that cloud controller'"'"'s respective clients, wherein all new file data received by each cloud controller from its clients is written to the remote cloud storage system via the receiving cloud controller;

maintaining at each cloud controller a copy of the complete metadata for all of the files stored in the distributed filesystem, wherein each cloud controller communicates any changes to the metadata for the distributed filesystem to the full set of cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of each file in the distributed filesystem;

upon receiving in a cloud controller new data from a client, storing the new file data for the distributed filesystem in the remote cloud storage system, wherein the cloud file is sent from the cloud controller to the remote cloud storage system as part of an incremental data snapshot; and

upon receiving confirmation that the cloud file has been successfully stored in the remote cloud storage system, sending from the cloud controller an incremental metadata snapshot that includes new metadata for the distributed filesystem that describes the new data, wherein the incremental metadata snapshot is received by the other cloud controllers of the distributed filesystem;

allocating a backup cloud controller associated with the distributed filesystem, wherein the backup cloud controller maintains a copy of the complete metadata for all of the files stored in the distributed filesystem and receives all of the metadata changes for the distributed filesystem, including the incremental metadata snapshot sent by the cloud controller;

detecting the failure of the cloud controller; and

rerouting data requests from clients associated with the failed cloud controller to the backup cloud controller.

View all claims

9 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The disclosed embodiments provide a system that distributes data for a distributed filesystem across multiple cloud storage systems. Two or more cloud controllers collectively manage distributed filesystem data that is stored in one or more cloud storage systems; the cloud controllers cache and ensure data consistency for the stored data. Whenever each cloud controller receives new data from a client, it outputs an incremental metadata snapshot for the new data that is propagated to the other cloud controllers and an incremental data snapshot containing the new data that is sent to a cloud storage system. During operation, a backup cloud controller associated with the distributed filesystem is also configured to receive each (incremental) metadata snapshot, such that, upon determining the failure of a cloud controller, the backup cloud controller can immediately begin receiving data requests from clients associated with the failed cloud controller.

49 Citations

View as Search Results

19 Claims

1. A computer-implemented method for providing disaster recovery for a distributed filesystem, the method comprising:
- collectively managing the data of the distributed filesystem using two or more cloud controllers, wherein collectively managing the data comprises;
  
  collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients can only access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in a remote cloud storage system using fixed-size cloud files, wherein each cloud controller caches a subset of the file data from the remote cloud storage system that is being actively accessed by that cloud controller'"'"'s respective clients, wherein all new file data received by each cloud controller from its clients is written to the remote cloud storage system via the receiving cloud controller;
  
  maintaining at each cloud controller a copy of the complete metadata for all of the files stored in the distributed filesystem, wherein each cloud controller communicates any changes to the metadata for the distributed filesystem to the full set of cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of each file in the distributed filesystem;
  
  upon receiving in a cloud controller new data from a client, storing the new file data for the distributed filesystem in the remote cloud storage system, wherein the cloud file is sent from the cloud controller to the remote cloud storage system as part of an incremental data snapshot; and
  
  upon receiving confirmation that the cloud file has been successfully stored in the remote cloud storage system, sending from the cloud controller an incremental metadata snapshot that includes new metadata for the distributed filesystem that describes the new data, wherein the incremental metadata snapshot is received by the other cloud controllers of the distributed filesystem;
  
  allocating a backup cloud controller associated with the distributed filesystem, wherein the backup cloud controller maintains a copy of the complete metadata for all of the files stored in the distributed filesystem and receives all of the metadata changes for the distributed filesystem, including the incremental metadata snapshot sent by the cloud controller;
  
  detecting the failure of the cloud controller; and
  
  rerouting data requests from clients associated with the failed cloud controller to the backup cloud controller.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The computer-implemented method of claim 1,wherein the backup cloud controller and all of the two or more cloud controllers receive all of the incremental metadata snapshots sent by the two or more cloud controllers;
    - wherein the backup cloud controller is distinct from the two or more cloud controllers;
      
      wherein storing the complete set of metadata for the distributed filesystem in each cloud controller facilitates accessing data from a cloud file in the cloud storage system; and
      
      wherein storing the complete set of metadata for the distributed filesystem in the backup cloud controller facilitates responding to client requests received from clients previously associated with the cloud controller.
  - 3. The computer-implemented method of claim 2, wherein receiving the incremental metadata snapshot at the backup cloud controller further comprises:
    - using the received incremental metadata snapshot to identify the new data that was uploaded to the cloud storage system by the cloud controller; and
      
      downloading and caching the new data in the backup cloud controller.
  - 4. The computer-implemented method of claim 3, wherein prior to the failure of the cloud controller, the method further comprises downloading and caching in the backup cloud controller the data most recently written to the cloud storage system by all of the two or more cloud controllers, thereby facilitating efficiently and seamlessly taking over the role of a failed cache controller.
  - 5. The computer-implemented method of claim 3, wherein subsequent to the failure of the cloud controller the backup cloud controller is configured to manage its local cache to optimize the performance of the data requests received from its clients.
  - 6. The computer-implemented method of claim 3,wherein the two or more cloud controllers are configured to track client data usage to determine a set of data that is most commonly accessed by the clients of the distributed filesystem;
    - andwherein prior to the failure of the cloud controller, the backup cloud controller is configured to download the determined set of data from the cloud storage provider and cache the determined set of data locally.
  - 7. The computer-implemented method of claim 4, wherein the backup cloud controller only receives data requests from clients subsequent to the failure of the cloud controller.
  - 8. The computer-implemented method of claim 1, wherein two or more backup cloud controllers are associated with the distributed filesystem;
    - andwherein choosing one of the two or more backup cloud controllers to take over for the failed cloud controller involves considering at least one of network characteristics and network proximity to the failed cloud controller.
  - 9. The computer-implemented method of claim 8,wherein the cloud storage system is unaware of the organization and structure of the distributed filesystem;
    - wherein data stored in the distributed filesystem is indexed using a global address space;
      
      wherein data is stored in the cloud storage system as cloud files, wherein each cloud file is uniquely indexed in the global address space; and
      
      wherein using a metadata entry to download a cloud file that contains a desired data block from the cloud storage system comprises;
      
      determining from the metadata entry that the desired data block is not presently stored in the requesting cloud controller;
      
      using a global address stored in the metadata entry to identify the cloud file that includes the data block;
      
      downloading the identified cloud file to the requesting cloud controller; and
      
      using an offset stored in the metadata entry to determine the location of the data block in the cloud file.
  - 10. The computer-implemented method of claim 9,wherein a second cloud controller receives the incremental metadata snapshot;
    - andwherein the second cloud controller uses the incremental metadata snapshot to retrieve data associated with the incremental data snapshot from the cloud storage system.

11. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for providing disaster recovery for a distributed filesystem, the method comprising:
- collectively managing the data of the distributed filesystem using two or more cloud controllers, wherein collectively managing the data comprises;
  
  collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients can only access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in a remote cloud storage system using fixed-size cloud files, wherein each cloud controller caches a subset of the file data from the remote cloud storage system that is being actively accessed by that cloud controller'"'"'s respective clients, wherein all new file data received by each cloud controller from its clients is written to the remote cloud storage system via the receiving cloud controller;
  
  maintaining at each cloud controller a copy of the complete metadata for all of the files stored in the distributed filesystem, wherein each cloud controller communicates any changes to the metadata for the distributed filesystem to the full set of cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of each file in the distributed filesystem;
  
  upon receiving in a cloud controller new data from a client, storing the new file data for the distributed filesystem in the remote cloud storage system, wherein the cloud file is sent from the cloud controller to the remote cloud storage system as part of an incremental data snapshot; and
  
  upon receiving confirmation that the cloud file has been successfully stored in the remote cloud storage system, sending from the cloud controller an incremental metadata snapshot that includes new metadata for the distributed filesystem that describes the new data, wherein the incremental metadata snapshot is received by the other cloud controllers of the distributed filesystem;
  
  allocating a backup cloud controller associated with the distributed filesystem, wherein the backup cloud controller maintains a copy of the complete metadata for all of the files stored in the distributed filesystem and receives all of the metadata changes for the distributed filesystem, including the incremental metadata snapshot sent by the cloud controller;
  
  detecting the failure of the cloud controller; and
  
  ;
  
  rerouting data requests from clients associated with the failed cloud controller to the backup cloud controller.
- View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
- - 12. The non-transitory computer-readable storage medium of claim 11,wherein the backup cloud controller and all of the two or more cloud controllers receive all of the incremental metadata snapshots sent by the two or more cloud controllers;
    - wherein the backup cloud controller is distinct from the two or more cloud controllers;
      
      wherein storing the complete set of metadata for the distributed filesystem in each cloud controller facilitates accessing data from a cloud file in the cloud storage system; and
      
      wherein storing the complete set of metadata for the distributed filesystem in the backup cloud controller facilitates responding to client requests received from clients previously associated with the cloud controller.
  - 13. The non-transitory computer-readable storage medium of claim 12, wherein receiving the incremental metadata snapshot at the backup cloud controller further comprises:
    - using the received incremental metadata snapshot to identify the new data that was uploaded to the cloud storage system by the cloud controller; and
      
      downloading and caching the new data in the backup cloud controller.
  - 14. The non-transitory computer-readable storage medium of claim 13, wherein prior to the failure of the cloud controller, the method further comprises downloading and caching in the backup cloud controller the data most recently written to the cloud storage system by all of the two or more cloud controllers, thereby facilitating efficiently and seamlessly taking over the role of a failed cache controller.
  - 15. The non-transitory computer-readable storage medium of claim 13, wherein subsequent to the failure of the cloud controller the backup cloud controller is configured to manage its local cache to optimize the performance of the data requests received from its clients.
  - 16. The non-transitory computer-readable storage medium of claim 13,wherein the two or more cloud controllers are configured to track client data usage to determine a set of data that is most commonly accessed by the clients of the distributed filesystem;
    - andwherein prior to the failure of the cloud controller, the backup cloud controller is configured to download the determined set of data from the cloud storage provider and cache the determined set of data locally.
  - 17. The non-transitory computer-readable storage medium of claim 14, wherein the backup cloud controller only receives data requests from clients subsequent to the failure of the cloud controller.
  - 18. The non-transitory computer-readable storage medium of claim 17,wherein two or more backup cloud controllers are associated with the distributed filesystem;
    - andwherein choosing one of the two or more backup cloud controllers to take over for the failed cloud controller involves considering at least one of network characteristics and network proximity to the failed cloud controller.

19. A backup cloud controller that provides disaster recovery for a distributed filesystem, comprising:
- a processor;
  
  a storage mechanism that stores metadata for the distributed filesystem; and
  
  a receiving mechanism;
  
  wherein two or more cloud controllers collectively manage the data of the distributed filesystem, wherein collectively managing the data comprises;
  
  collectively presenting a unified namespace for the distributed filesystem to the clients of the distributed filesystem via the two or more cloud controllers, wherein the clients can only access the distributed filesystem via the cloud controllers, wherein the file data for the distributed filesystem is stored in a remote cloud storage system using fixed-size cloud files, wherein each cloud controller caches a subset of the file data from the remote cloud storage system that is being actively accessed by that cloud controller'"'"'s respective clients, wherein all new file data received by each cloud controller from its clients is written to the remote cloud storage system via the receiving cloud controller;
  
  maintaining at each cloud controller a copy of the complete metadata for all of the files stored in the distributed filesystem, wherein each cloud controller communicates any changes to the metadata for the distributed filesystem to the full set of cloud controllers for the distributed filesystem to ensure that the clients of the distributed filesystem share a consistent view of each file in the distributed filesystem;
  
  upon receiving in a cloud controller new data from a client, storing the new file data for the distributed filesystem in the remote cloud storage system, wherein the cloud file is sent from the cloud controller to the remote cloud storage system as part of an incremental data snapshot;
  
  upon receiving confirmation that the cloud file has been successfully stored in the remote cloud storage system, sending from the cloud controller an incremental metadata snapshot that includes new metadata for the distributed filesystem that describes the new data, wherein the incremental metadata snapshot is received by the other cloud controllers of the distributed filesystem;
  
  wherein the storage mechanism is configured to maintains a copy of the complete metadata for all of the files stored in the distributed filesystem;
  
  wherein the receiving mechanism is configured to receive all of the metadata changes for the distributed filesystem, including the incremental metadata snapshot, and store the received metadata in the storage mechanism;
  
  wherein the backup cloud controller is configured to detect the failure of the cloud controller and reroute data requests from the clients associated with the failed cloud controller to the backup cloud controller; and
  
  wherein the receiving mechanism is further configured to, upon receive data requests from clients associated with the failed cloud controller.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Panzura, Inc.
Original Assignee
Panzura, Inc.
Inventors
Taylor, John Richard, Chou, Randy Yen-pang, Davis, Andrew P.
Primary Examiner(s)
Bates, Kevin
Assistant Examiner(s)
RAHMAN, SM AZIZUR

Application Number

US13/725,759
Publication Number

US 20130111262A1
Time in Patent Office

599 Days
Field of Search

709/219
US Class Current

709/219
CPC Class Codes

G06F 11/2005   using redundant communicati...

G06F 11/2089   Redundant storage control f...

G06F 11/2097   maintaining the standby con...

G06F 16/183   Provision of network file s...

G06F 16/1865   Transactional file systems

Providing disaster recovery for a distributed filesystem

First Claim

9 Assignments

0 Petitions

Accused Products

Abstract

49 Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Providing disaster recovery for a distributed filesystem

First Claim

9 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

49 Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links