Calculating deduplication digests for a synthetic backup by a deduplication storage system
First Claim
1. A method for calculating deduplication digests for a synthetic backup by a deduplication storage system using a processor device, comprising:
- constructing the synthetic backup by processing a plurality of metadata instructions provided by a backup application;
locating stored data segments referenced by the synthetic backup;
calculating deduplication digests of the synthetic backup based on stored digests of the referenced stored data segments;
partitioning a data segment of the synthetic backup into fixed sized sub-segments, wherein each of the fixed sub-segments references multiple stored fixed sized sub-segments;
aggregating the calculated digests of the synthetic backup sub-segments;
forming the deduplication digest for the synthetic backup from the deduplication digests of all data segments of the synthetic backup;
calculating the deduplication digest for each of the fixed sized sub-segments of the synthetic backup based on retrieved deduplication digests of the stored fixed sized sub-segments referenced by a synthetic backup sub-segment;
calculating a threshold digest value from the retrieved deduplication digests;
calculating a sub-set of candidate digest values from a set of retrieved digest values by including a digest in the sub-set if a value of the digest is one of equal to and larger than the threshold and a storage location of the digest is within boundaries of the synthetic backup sub-segment;
arranging digests in descending order of values of the digests and selecting a first m digests if a number of the digests that are denoted as m in a set of candidate digests is one of equal to and larger than a required number of digests for each of the fixed sized sub-segments; and
calculating the digests, based on data of the synthetic backup sub-segment, if the number of digests in the set of candidate digests is lower than m.
1 Assignment
0 Petitions
Accused Products
Abstract
Input backup data is deduplicated with data of a synthetic backup previously constructed by a deduplication storage. A synthetic backup is constructed by processing metadata instructions provided by a backup application. Deduplication digests are calculated based on the data of the synthetic backup and the deduplication digests are stored in a digests index. When new backup data is processed, deduplication digests of the new data are calculated and searched in the digests index. A data segment of the synthetic backup is partitioned into fixed sized sub-segments. The calculated digests of sub-segment are aggregated to produce the deduplication digest, and the deduplication digest is formed for the synthetic backup.
41 Citations
9 Claims
-
1. A method for calculating deduplication digests for a synthetic backup by a deduplication storage system using a processor device, comprising:
-
constructing the synthetic backup by processing a plurality of metadata instructions provided by a backup application; locating stored data segments referenced by the synthetic backup; calculating deduplication digests of the synthetic backup based on stored digests of the referenced stored data segments; partitioning a data segment of the synthetic backup into fixed sized sub-segments, wherein each of the fixed sub-segments references multiple stored fixed sized sub-segments; aggregating the calculated digests of the synthetic backup sub-segments; forming the deduplication digest for the synthetic backup from the deduplication digests of all data segments of the synthetic backup; calculating the deduplication digest for each of the fixed sized sub-segments of the synthetic backup based on retrieved deduplication digests of the stored fixed sized sub-segments referenced by a synthetic backup sub-segment; calculating a threshold digest value from the retrieved deduplication digests; calculating a sub-set of candidate digest values from a set of retrieved digest values by including a digest in the sub-set if a value of the digest is one of equal to and larger than the threshold and a storage location of the digest is within boundaries of the synthetic backup sub-segment; arranging digests in descending order of values of the digests and selecting a first m digests if a number of the digests that are denoted as m in a set of candidate digests is one of equal to and larger than a required number of digests for each of the fixed sized sub-segments; and calculating the digests, based on data of the synthetic backup sub-segment, if the number of digests in the set of candidate digests is lower than m. - View Dependent Claims (2, 3)
-
-
4. A system for calculating deduplication digests for a synthetic backup by a deduplication storage system, comprising:
-
the deduplication storage system; and at least one processor device, operable in the deduplication computing storage environment, wherein the at least one processor device; constructs the synthetic backup by processing a plurality of metadata instructions provided by a backup application, locates stored data segments referenced by the synthetic backup, calculates deduplication digests of the synthetic backup based on the stored digests of the referenced stored data segments, partitions a data segment of the synthetic backup into fixed sized sub-segments, wherein each of the fixed sub-segments references multiple stored fixed sized sub-segments, aggregates the calculated digests of the synthetic backup sub-segments, forms the deduplication digest for the synthetic backup from the deduplication digests of all data segments of the synthetic backup, calculates the deduplication digest for each of the fixed sized sub-segments of the synthetic backup based on retrieved deduplication digests of the stored fixed sized sub-segments referenced by a synthetic backup sub-segment, calculates a threshold digest value from the retrieved deduplication digests, calculates a sub-set of candidate digest values from a set of retrieved digest values by including a digest in the sub-set if a value of the digest is one of equal to and larger than the threshold and a storage location of the digest is within boundaries of the synthetic backup sub-segment, arranges digests in descending order of values of the digests and selecting a first m digests if a number of the digests that are denoted as m in a set of candidate digests is one of equal to and larger than a required number of digests for each of the fixed sized sub-segments, and calculates the digests, based on data of the synthetic backup sub-segment, if the number of digests in the set of candidate digests is lower than m. - View Dependent Claims (5, 6)
-
-
7. A computer program product calculating deduplication digests for a synthetic backup by a deduplication storage system using at least one processor device, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising:
-
a first executable portion that constructs the synthetic backup by processing a plurality of metadata instructions provided by a backup application; a second executable portion that locates stored data segments referenced by the synthetic backup; a third executable portion that calculates deduplication digests of the synthetic backup based on the stored digests of the referenced stored data segments; a fourth executable portion that partitions a data segment of the synthetic backup into fixed sized sub-segments, wherein each of the fixed sub-segments references multiple stored fixed sized sub-segments; a fifth executable portion that aggregates the calculated digests of the synthetic backup sub-segments; a sixth executable portion that forms the deduplication digest for the synthetic backup from the deduplication digests of all data segments of the synthetic backup; a seventh executable portion that calculates the deduplication digest for each of the fixed sized sub-segments of the synthetic backup based on retrieved deduplication digests of the stored fixed sized sub-segments referenced by a synthetic backup sub-segment; and an eighth executable portion that performs each of; calculating a threshold digest value from the retrieved deduplication digests, calculating a sub-set of candidate digest values from a set of retrieved digest values by including a digest in the sub-set if a value of the digest is one of equal to and larger than the threshold and a storage location of the digest is within boundaries of the synthetic backup sub-segment, arranging digests in descending order of values of the digests and selecting a first m digests if a number of the digests that are denoted as m in a set of candidate digests is one of equal to and larger than a required number of digests for each of the fixed sized sub-segments, and calculating the digests, based on data of the synthetic backup sub-segment, if the number of digests in the set of candidate digests is lower than m. - View Dependent Claims (8, 9)
-
Specification