Creation of synthetic backups within deduplication storage system by a backup application
First Claim
1. A method for deduplicating input backup data with data of a synthetic backup previously constructed by a deduplication storage system using a processor device, comprising:
- constructing the synthetic backup by processing a plurality of metadata instructions provided by a backup application, the synthetic backup to be independent of referenced stored backups;
processing each of the plurality of metadata instructions by each of;
partitioning each data segment input into each of a plurality of fixed-sized data sub-segments, each sub-segment referencing a plurality of stored sub-segments,for each of the plurality of data sub-segments, during the construction of the synthetic backup, calculating each of a plurality of input deduplication digests based on a retrieved plurality of stored deduplication digests by aggregating calculated deduplication digests of the plurality of data sub-segments to produce a respective one of the plurality of input deduplication digests for each data segment input,locating those of the plurality of data sub-segments in the deduplication storage system specified by the data segment in each of the plurality of metadata instructions, andcreating metadata references to each of the plurality of data sub-segments and adding the metadata references to metadata of the synthetic backup being created;
wherein the metadata references include physical and logical data patterns;
transforming a set of the plurality of metadata instructions into a transformed set of the plurality of metadata instructions;
creating the synthetic backup by the deduplication system and the backup application by consolidating the plurality of metadata instructions that reference adjacent backup data segments into a single metadata instruction;
wherein the synthetic backup includes data from an existing full backup and subsequent incremental backups of the existing full backup dating until a specific point in time;
calculating deduplication digests based on the data of the synthetic backup; and
locating matching digests of previously constructed synthetic backups in a digests index, wherein each of the located matching digest references stored data included in the synthetic backup, and the stored data is similar to the input backup data.
1 Assignment
0 Petitions
Accused Products
Abstract
Input backup data is deduplicated with data of a synthetic backup previously constructed by a deduplication storage system. A synthetic backup is constructed by processing metadata instructions provided by a backup application. Deduplication digests are calculated based on the data of the synthetic backup and the deduplication digests are stored in a digests index. When new backup data is processed, deduplication digests of the new data are calculated and searched in the digests index. Matching digests of previously constructed synthetic backups are located in the digests index. Each of the located matching digest references stored data are included in the synthetic backup, and the stored data is similar to the input backup data. Data matches are found in the input backup data and data in the synthetic backup.
-
Citations
21 Claims
-
1. A method for deduplicating input backup data with data of a synthetic backup previously constructed by a deduplication storage system using a processor device, comprising:
-
constructing the synthetic backup by processing a plurality of metadata instructions provided by a backup application, the synthetic backup to be independent of referenced stored backups; processing each of the plurality of metadata instructions by each of; partitioning each data segment input into each of a plurality of fixed-sized data sub-segments, each sub-segment referencing a plurality of stored sub-segments, for each of the plurality of data sub-segments, during the construction of the synthetic backup, calculating each of a plurality of input deduplication digests based on a retrieved plurality of stored deduplication digests by aggregating calculated deduplication digests of the plurality of data sub-segments to produce a respective one of the plurality of input deduplication digests for each data segment input, locating those of the plurality of data sub-segments in the deduplication storage system specified by the data segment in each of the plurality of metadata instructions, and creating metadata references to each of the plurality of data sub-segments and adding the metadata references to metadata of the synthetic backup being created;
wherein the metadata references include physical and logical data patterns;transforming a set of the plurality of metadata instructions into a transformed set of the plurality of metadata instructions; creating the synthetic backup by the deduplication system and the backup application by consolidating the plurality of metadata instructions that reference adjacent backup data segments into a single metadata instruction;
wherein the synthetic backup includes data from an existing full backup and subsequent incremental backups of the existing full backup dating until a specific point in time;calculating deduplication digests based on the data of the synthetic backup; and locating matching digests of previously constructed synthetic backups in a digests index, wherein each of the located matching digest references stored data included in the synthetic backup, and the stored data is similar to the input backup data. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system for deduplicating input backup data with data of a synthetic backup previously constructed by a deduplication storage system, comprising:
-
the deduplication storage system; at least one processor device, operable in the deduplication computing storage environment, wherein the at least one processor device; constructs the synthetic backup by processing a plurality of metadata instructions provided by a backup application, the synthetic backup to be independent of referenced stored backups, processes each of the plurality of metadata instructions by each of; partitioning each data segment input into each of a plurality of fixed-sized data sub-segments, each sub-segment referencing a plurality of stored sub-segments, for each of the plurality of data sub-segments, during the construction of the synthetic backup, calculating each of a plurality of input deduplication digests based on a retrieved plurality of stored deduplication digests by aggregating calculated deduplication digests of the plurality of data sub-segments to produce a respective one of the plurality of input deduplication digests for each data segment input, locating those of the plurality of data sub-segments in the deduplication storage system specified by the data segment in each of the plurality of metadata instructions, and creating metadata references to each of the plurality of data sub-segments and adding the metadata references to metadata of the synthetic backup being created;
wherein the metadata references include physical and logical data patterns,transforms a set of the plurality of metadata instructions into a transformed set of the plurality of metadata instructions, creates the synthetic backup by the deduplication system and the backup application by consolidating the plurality of metadata instructions that reference adjacent backup data segments into a single metadata instruction;
wherein the synthetic backup includes data from an existing full backup and subsequent incremental backups of the existing full backup dating until a specific point in time,calculates deduplication digests based on the data of the synthetic backup, and locates matching digests of previously constructed synthetic backups in a digests index, wherein each of the located matching digest references stored data included in the synthetic backup, and the stored data is similar to the input backup data. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A computer program product for deduplicating input backup data with data of a synthetic backup previously constructed by a deduplication storage system using at least one processor device, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising:
-
an executable portion that constructs the synthetic backup by processing a plurality of metadata instructions provided by a backup application, the synthetic backup to be independent of referenced stored backups; an executable portion that processes each of the plurality of metadata instructions by each of; partitioning each data segment input into each of a plurality of fixed-sized data sub-segments, each sub-segment referencing a plurality of stored sub-segments, for each of the plurality of data sub-segments, during the construction of the synthetic backup, calculating each of a plurality of input deduplication digests based on a retrieved plurality of stored deduplication digests by aggregating calculated deduplication digests of the plurality of data sub-segments to produce a respective one of the plurality of input deduplication digests for each data segment input, locating those of the plurality of data sub-segments in the deduplication storage system specified by the data segment in each of the plurality of metadata instructions, and creating metadata references to each of the plurality of data sub-segments and adding the metadata references to metadata of the synthetic backup being created;
wherein the metadata references include physical and logical data patterns;an executable portion that transforms a set of the plurality of metadata instructions into a transformed set of the plurality of metadata instructions; an executable portion that creates the synthetic backup by the deduplication system and the backup application by consolidating the plurality of metadata instructions that reference adjacent backup data segments into a single metadata instruction;
wherein the synthetic backup includes data from an existing full backup and subsequent incremental backups of the existing full backup dating until a specific point in time;an executable portion that calculates deduplication digests based on the data of the synthetic backup; and an executable portion that locates matching digests of previously constructed synthetic backups in a digests index, wherein each of the located matching digest references stored data included in the synthetic backup, and the stored data is similar to the input backup data. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification