×

Deduplicating input backup data with data of a synthetic backup previously constructed by a deduplication storage system

  • US 9,858,286 B2
  • Filed: 03/13/2013
  • Issued: 01/02/2018
  • Est. Priority Date: 12/01/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method for deduplicating input backup data with data of a synthetic backup previously constructed by a deduplication storage system using a processor device, comprising:

  • constructing the synthetic backup by processing a plurality of metadata instructions provided by a backup application, the synthetic backup to be independent of referenced stored backups;

    processing each of the plurality of metadata instructions by each of;

    partitioning each data segment input into each of a plurality of fixed-sized data sub-segments, each sub-segment referencing a plurality of stored sub-segments,for each of the plurality of data sub-segments, during the construction of the synthetic backup, calculating each of a plurality of input deduplication digests based on a retrieved plurality of stored deduplication digests by aggregating calculated deduplication digests of the plurality of data sub-segments to produce a respective one of the plurality of input deduplication digests for each data segment input,locating those of the plurality of data sub-segments in the deduplication storage system specified by the data segment in each of the plurality of metadata instructions, andcreating metadata references to each of the plurality of data sub-segments and adding the metadata references to metadata of the synthetic backup being created;

    wherein the metadata references include physical and logical data patterns;

    transforming a set of the plurality of metadata instructions into a transformed set of the plurality of metadata instructions;

    creating the synthetic backup by the deduplication system and the backup application by consolidating the plurality of metadata instructions that reference adjacent backup data segments into a single metadata instruction;

    wherein the synthetic backup includes data from an existing full backup and subsequent incremental backups of the existing full backup dating until a specific point in time;

    calculating deduplication digests based on the data of the synthetic backup;

    storing the deduplication digests in a digests index;

    calculating and searching, in the digests index, the deduplication digests of new data when new backup data is processed;

    locating matching digests of previously constructed synthetic backups in the digests index, wherein each of the located matching digest references stored data included in the synthetic backup, and the stored data is similar to the input backup data; and

    finding data matches in the input backup data and data in the synthetic backup.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×