×

Producing alternative segmentations of data into blocks in a data deduplication system

  • US 9,922,042 B2
  • Filed: 07/15/2013
  • Issued: 03/20/2018
  • Est. Priority Date: 07/15/2013
  • Status: Active Grant
First Claim
Patent Images

1. A method for producing a plurality of segmentations of input data into blocks in a data deduplication system using a processor device in a computing environment, comprising:

  • calculating digests for an input data chunk using a primary segmentation by using a single linear scan of rolling hash values for calculating both the primary segmentation and similarity search values for the input data chunk, the input data chunk being at least 16 Megabytes (MB) in size;

    obtaining and applying secondary segmentations for each one of a plurality of data mismatches based on reference data;

    storing the primary segmentation and corresponding primary digests for the input data chunk in a sequence corresponding to a placement order of calculated values of the calculated digests associated with the primary digests, the placement order of the calculated values of the calculated digests correlative to an order in which input digest values were calculated such that the primary digests are stored in a linear form independent of a deduplicated form by which data the primary digests describe is stored;

    obtaining the segmentations for each one of the data mismatches by considering input digests included in data matches preceding and following each one of the data mismatches; and

    avoiding storing the secondary segmentations and corresponding secondary digests for the input data chunk.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×