×

Change tracking for multiphase deduplication

  • US 8,738,577 B1
  • Filed: 03/01/2013
  • Issued: 05/27/2014
  • Est. Priority Date: 03/01/2013
  • Status: Active Grant
First Claim
Patent Images

1. A non-transitory computer-readable medium storing a program that causes a processor to execute a method of multiphase deduplication, the method comprising:

  • a change tracking phase that includes performing the following steps for each allocated block in a source storage that is changed between the taking of a prior snapshot at a prior point in time and upon which a prior base or incremental backup is based and the taking of a subsequent snapshot at a subsequent point in time and upon which a subsequent incremental backup is based, without performing the following steps on any allocated block in the source storage that is not changed between the prior point in time and the subsequent point in time;

    temporarily storing a copy of the changed block in a volatile memory of the source system prior to writing the changed block to the source storage;

    performing a hash function only once on the copy of the changed block, while the copy is temporarily stored in a volatile memory of the source system, to calculate a hash value corresponding to the changed block;

    writing the changed block to the source storage; and

    tracking, in a change log, a location in the source storage of the changed block and the corresponding cryptographic hash value;

    an analysis phase that is performed after completion of the change tracking phase and that includes performing the following steps for each unique hash value stored in the change log;

    comparing the hash value with hash values of blocks that are stored in a vault storage, without reading the corresponding unique changed block from the source storage, to determine if the corresponding unique changed block in the source storage is duplicated in the vault storage; and

    associating a location of the corresponding unique changed block in the source storage with a location of the corresponding duplicated block stored in the vault storage if the corresponding unique changed block is duplicated in the vault storage; and

    a backup phase that includes performing, after completion of the analysis phase, the following steps for all unique nonduplicate runs of changed blocks stored in the source storage, where each unique nonduplicate run of changed blocks includes two or more nonduplicate changed blocks that are stored sequentially in the source storage;

    reading the runs from the source storage;

    storing the runs in the vault storage in the same sequence as stored in the source storage at the subsequent point in time; and

    associating a location of each run stored in the source storage with a corresponding location of the run stored in the vault storage.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×