×

Method and system to improve deduplication of structured datasets using hybrid chunking and block header removal

  • US 9,183,218 B1
  • Filed: 06/29/2012
  • Issued: 11/10/2015
  • Est. Priority Date: 06/29/2012
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method, comprising:

  • receiving a request at a system to deduplicate a file having a plurality of data blocks, each data block having a header and a data portion, wherein the file is received from a client application of a client device over a network to be stored in the system;

    scanning to search a predetermined signature embedded within a header of each data block to identify a block boundary between the header and the data portion;

    anchoring the data blocks using first anchors to indicate block boundaries based on the scanning of the predetermined signature, includingrecognizing a plurality of markers within the data portions of the data blocks, wherein the markers were inserted into the data blocks by the client application prior to receiving the file over the network,removing the recognized markers from the file, andanchoring the data blocks using the first anchors at locations of the removed markers, wherein an anchor denotes a boundary between two data blocks;

    adding at least one second anchor within a data portion of at least one data block that has been anchored by two of the first anchors, if the data portion of at least one data block satisfies a predetermined condition, wherein the second anchor is located between two first anchors;

    separating data portions of the data blocks from the headers based on the first anchors;

    chunking the data portion of the data blocks based on the first anchors and the at least one second anchor, generating a plurality of data chunks; and

    deduplicating the data chunks of the data portions of the data blocks.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×