×

De-duplication of files for continuous data protection with remote storage

  • US 9,223,793 B1
  • Filed: 06/03/2010
  • Issued: 12/29/2015
  • Est. Priority Date: 06/03/2009
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of backing-up a version of a data file to a remote storage, the method comprising executing instructions on a computer to perform the operations of:

  • de-duplicating the version of the data file against a previous version master file stored on a local storage device, wherein the previous version master file comprises a single instance of each of one or more unique data blocks of a specific block size from a previous version of the data file, and wherein de-duplicating the version of the data file against the previous version master file comprises determining whether at least one block of data from the version of the data file matches at least one of the one or more unique data blocks of the specific size from the previous version of the data file by;

    maintaining a lightweight checksum for each of the one or more unique data blocks in the previous version master file, andmatching a block of data read from the version of the data file to one of the one or more unique data blocks in the previous version master file by calculating a lightweight checksum value for the read block of data and comparing the calculated lightweight checksum value with the lightweight checksums maintained for the one or more unique data blocks in the previous version master file, wherein calculating the lightweight checksum for a subsequently read block of data from the version of the data file comprises subtracting one or more bytes from the lightweight checksum for a previously read block of data and adding one or more bytes from the subsequently read block of data to the lightweight checksum for the previously read block of data;

    creating a supplemental file corresponding to the version of the data file and comprising one or more chunks of data from the version of the data file not matching one of the one or more unique data blocks in the previous version master file;

    creating a version map file corresponding to the version of the data file and comprising one or more references to unique data blocks in the previous version master file and one or more references to chunks of data in the supplemental file, wherein each of the one or more references to unique data blocks in the previous version master file comprise an index to a unique data block and each of the one or more the references to chunks of data in the supplemental file comprise a length of a chunk of data; and

    storing the supplemental file and the version map file corresponding to the version of the data file to the remote storage, wherein the remote storage contains a master file corresponding to the data file and comprising each of the unique data blocks referenced in the version map file.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×