×

Adaptive similarity search resolution in a data deduplication system

  • US 10,073,853 B2
  • Filed: 07/17/2013
  • Issued: 09/11/2018
  • Est. Priority Date: 07/17/2013
  • Status: Active Grant
First Claim
Patent Images

1. A method for adaptive similarity search resolution in a data deduplication system using a processor device in a computing environment, comprising:

  • partitioning input data into input data chunks, the input data chunks each being at least 4 Megabytes (MB) in size;

    calculating input similarity elements for an input chunk;

    using the input similarity elements to find similar data in a repository of data using a similarity search structure;

    calculating a resolution level for storing the input similarity elements, the resolution level comprising a number of the input similarity elements in relation to a size of the input chunk;

    storing the input similarity elements in the calculated resolution level in the similarity search structure;

    deduplicating the input chunk with the found similar data in the repository of data using the input similarity units in the calculated resolution level;

    calculating the resolution level for storing the input similarity elements based on calculated sets of similarity element matches and on a calculated deduplication ratio, the deduplication ratio defined as a total size of the input data covered by matches with repository data out of the total size of the input data; and

    decreasing the resolution level of the stored input similarity elements if an aggregated deduplication ratio is not lower than a predefined threshold and an average size of the calculated sets of similarity element matches is not lower than two and a current resolution level is higher than a lowest resolution level.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×