×

Performing deduplication in a distributed filesystem

  • US 9,679,040 B1
  • Filed: 02/15/2013
  • Issued: 06/13/2017
  • Est. Priority Date: 05/03/2010
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for performing distributed deduplication in a distributed filesystem, the method comprising:

  • collectively managing the data of the distributed filesystem using two or more cloud controllers, wherein collectively managing the data comprises;

    storing the data for the distributed filesystem in one or more cloud storage systems, wherein the cloud controllers cache and ensure data consistency for data stored in the cloud storage systems; and

    maintaining in each cloud controller a metadata hierarchy that reflects the current state of the distributed filesystem, wherein changes to the metadata for the distributed filesystem are synchronized across the cloud controllers, wherein synchronizing metadata includes synchronizing deduplication information across the cloud controllers, wherein each cloud controller maintains a deduplication table that tracks deduplicated data for the distributed filesystem; and

    receiving at a cloud controller an incremental metadata snapshot from a remote cloud controller, wherein the incremental metadata snapshot comprises deduplication information for a file that was received by the remote cloud controller, wherein the cloud controller, the remote cloud controller and the cloud storage system are all distinct computing devices;

    adding the deduplication information from the incremental metadata snapshot to the deduplication table on the cloud controller;

    receiving at the cloud controller a client write request that comprises new file data;

    using the deduplication table on the cloud controller to determine that one or more data blocks in the new file data have previously been written to the distributed filesystem by the remote cloud controller;

    updating the metadata hierarchy and the deduplication table for the cloud controller to link the metadata for these duplicate new data blocks with the location of the deduplicated data blocks in the cloud storage system and cache of the cloud controller; and

    distributing a subsequent incremental metadata update from the cloud controller to the other cloud controllers for the distributed filesystem that notifies the other cloud controllers of the new file data and includes deduplication updates related to the new file data that enable the other cloud controllers to update the reference counts and entries in their own respective deduplication tables to reflect the addition of the new file data to the distributed filesystem.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×