×

Scalable post-process deduplication

  • US 9,946,724 B1
  • Filed: 03/31/2014
  • Issued: 04/17/2018
  • Est. Priority Date: 03/31/2014
  • Status: Active Grant
First Claim
Patent Images

1. A system comprising:

  • a memory that has stored thereon computer executable components; and

    at least one processor that executes the following computer executable components stored in the memory;

    a phase rotation component that generates a set of datasets of a file system, wherein the set of datasets includes at least a first dataset and a second dataset, and wherein the phase rotation component sends the first dataset to the enumeration component for ingestion;

    an enumeration component that ingests a dataset by;

    reading a set of low level hashes associated with the dataset, wherein low level hashes in the set of low level hashes are associated with a logical block identifier of the file system;

    analyzing the set of low level hashes and determining a set of potential matching candidates;

    generating a set of high level hashes based on the set of potential matching candidates and associated logical block identifiers; and

    adding the set of high level hashes and associated logical block identifiers to a candidate table;

    a disk pool policy component that in response to the enumeration component generating a set of high level hashes, determines and associates a disk pool policy identifier with high level hashes in the set of high level hashes;

    a commonality component that determines a set of shareable blocks by comparing high level hashes in the set of high level hashes of the candidate table with other high level hashes of the candidate table and an index table, wherein the index table contains a set of high level hashes, associated disk pool policy identifiers, and associated shadow store logical block identifiers, and wherein the set of shareable blocks is based on common disk pool policy identifiers; and

    a sharing component that updates the file system based on the set of shareable blocks, wherein, in response to the sharing component updating the file system, the phase rotation component sends the second dataset to the enumeration component for ingestion.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×