×

Compatibility and inclusion of similarity element resolutions

  • US 10,133,502 B2
  • Filed: 07/15/2013
  • Issued: 11/20/2018
  • Est. Priority Date: 07/15/2013
  • Status: Active Grant
First Claim
Patent Images

1. A method for adaptive similarity search using compatibility and inclusion of similarity element resolutions in a data deduplication system using a processor device in a computing environment, comprising:

  • configuring a plurality of resolution levels for a similarity search;

    calculating input similarity elements in a certain one of the plurality of resolution levels for a chunk of input data, wherein each resolution level is determined by identifying a number of the input similarity elements calculated in relation to a size of the chunk of input data;

    using the input similarity elements of the certain one of the plurality of resolution levels to find similar data in a repository of data where similarity elements of the stored similar repository data are of the plurality of resolution levels, for locating and reducing duplicate data within the data deduplication system;

    defining the plurality of resolution levels to be between a highest resolution level and a lowest resolution level;

    configuring the similarity elements of each one of the plurality of resolution levels which are lower than the certain one of the plurality of resolution levels to be a subset of the similarity elements of each one of the plurality of resolution levels which are higher than the certain one of the plurality of resolution levels;

    calculating an aggregated deduplication ratio as a total size of input chunks of data received over a certain time period covered by matches with repository data out of the total size of the input chunks relative to a total size of output data subsequent to deduplicating all of the input chunks of data received over the certain time period; and

    using the aggregated deduplication ratio to determine the resolution level of the input similarity elements by decreasing the similarity elements resolution level if the aggregated deduplication ratio is not lower than a predefined threshold and an average size of calculated sets of similarity element matches is not lower than two and a current resolution level is higher than the lowest resolution level.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×