×

Calculating deduplication digests for a synthetic backup by a deduplication storage system

  • US 9,575,983 B2
  • Filed: 04/21/2015
  • Issued: 02/21/2017
  • Est. Priority Date: 12/01/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method for calculating deduplication digests for a synthetic backup by a deduplication storage system using a processor device, comprising:

  • constructing the synthetic backup by processing a plurality of metadata instructions provided by a backup application;

    locating stored data segments referenced by the synthetic backup;

    calculating deduplication digests of the synthetic backup based on stored digests of the referenced stored data segments;

    partitioning a data segment of the synthetic backup into fixed sized sub-segments, wherein each of the fixed sub-segments references multiple stored fixed sized sub-segments;

    aggregating the calculated digests of the synthetic backup sub-segments;

    forming the deduplication digest for the synthetic backup from the deduplication digests of all data segments of the synthetic backup;

    calculating the deduplication digest for each of the fixed sized sub-segments of the synthetic backup based on retrieved deduplication digests of the stored fixed sized sub-segments referenced by a synthetic backup sub-segment;

    calculating a threshold digest value from the retrieved deduplication digests;

    calculating a sub-set of candidate digest values from a set of retrieved digest values by including a digest in the sub-set if a value of the digest is one of equal to and larger than the threshold and a storage location of the digest is within boundaries of the synthetic backup sub-segment;

    arranging digests in descending order of values of the digests and selecting a first m digests if a number of the digests that are denoted as m in a set of candidate digests is one of equal to and larger than a required number of digests for each of the fixed sized sub-segments; and

    calculating the digests, based on data of the synthetic backup sub-segment, if the number of digests in the set of candidate digests is lower than m.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×