Compatibility and inclusion of similarity element resolutions
First Claim
Patent Images
1. A method for adaptive similarity search using compatibility and inclusion of similarity element resolutions in a data deduplication system using a processor device in a computing environment, comprising:
- configuring a plurality of resolution levels for a similarity search;
calculating input similarity elements in a certain one of the plurality of resolution levels for a chunk of input data, wherein each resolution level is determined by identifying a number of the input similarity elements calculated in relation to a size of the chunk of input data;
using the input similarity elements of the certain one of the plurality of resolution levels to find similar data in a repository of data where similarity elements of the stored similar repository data are of the plurality of resolution levels, for locating and reducing duplicate data within the data deduplication system;
defining the plurality of resolution levels to be between a highest resolution level and a lowest resolution level;
configuring the similarity elements of each one of the plurality of resolution levels which are lower than the certain one of the plurality of resolution levels to be a subset of the similarity elements of each one of the plurality of resolution levels which are higher than the certain one of the plurality of resolution levels;
calculating an aggregated deduplication ratio as a total size of input chunks of data received over a certain time period covered by matches with repository data out of the total size of the input chunks relative to a total size of output data subsequent to deduplicating all of the input chunks of data received over the certain time period; and
using the aggregated deduplication ratio to determine the resolution level of the input similarity elements by decreasing the similarity elements resolution level if the aggregated deduplication ratio is not lower than a predefined threshold and an average size of calculated sets of similarity element matches is not lower than two and a current resolution level is higher than the lowest resolution level.
1 Assignment
0 Petitions
Accused Products
Abstract
For adaptive similarity search resolution in a data deduplication system using a processor device in a computing environment, multiple resolution levels are configured for a similarity search. Input similarity elements are calculated in one resolution level for a chunk of input data. The input similarity elements of the one resolution level are used to find similar data in a repository of data where similarity elements of the stored similar repository data are of the multiple resolution levels.
-
Citations
18 Claims
-
1. A method for adaptive similarity search using compatibility and inclusion of similarity element resolutions in a data deduplication system using a processor device in a computing environment, comprising:
-
configuring a plurality of resolution levels for a similarity search; calculating input similarity elements in a certain one of the plurality of resolution levels for a chunk of input data, wherein each resolution level is determined by identifying a number of the input similarity elements calculated in relation to a size of the chunk of input data; using the input similarity elements of the certain one of the plurality of resolution levels to find similar data in a repository of data where similarity elements of the stored similar repository data are of the plurality of resolution levels, for locating and reducing duplicate data within the data deduplication system; defining the plurality of resolution levels to be between a highest resolution level and a lowest resolution level; configuring the similarity elements of each one of the plurality of resolution levels which are lower than the certain one of the plurality of resolution levels to be a subset of the similarity elements of each one of the plurality of resolution levels which are higher than the certain one of the plurality of resolution levels; calculating an aggregated deduplication ratio as a total size of input chunks of data received over a certain time period covered by matches with repository data out of the total size of the input chunks relative to a total size of output data subsequent to deduplicating all of the input chunks of data received over the certain time period; and using the aggregated deduplication ratio to determine the resolution level of the input similarity elements by decreasing the similarity elements resolution level if the aggregated deduplication ratio is not lower than a predefined threshold and an average size of calculated sets of similarity element matches is not lower than two and a current resolution level is higher than the lowest resolution level. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system for adaptive similarity search using compatibility and inclusion of similarity element resolutions in a data deduplication system of a computing environment, the system comprising:
-
the data deduplication system; a repository operating in the data deduplication system; a memory in the data deduplication system; a similarity search structure in association with the memory in the data deduplication system; and at least one processor device operable in the computing storage environment for controlling the data deduplication system, wherein the at least one processor device; configuring a plurality of resolution levels for a similarity search, calculating input similarity elements in a certain one resolution of the plurality of levels for a chunk of input data, wherein each resolution level is determined by identifying a number of the input similarity elements calculated in relation to a size of the chunk of input data, using the input similarity elements of the certain one of the plurality of resolution levels to find similar data in the repository of data where similarity elements of the stored similar repository data are of the plurality of resolution levels, for locating and reducing duplicate data within the data deduplication system, defining the plurality of resolution levels to be between a highest resolution level and a lowest resolution level, configuring the similarity elements of each one of the plurality of resolution levels which are lower than the certain one of the plurality of resolution levels to be a subset of the similarity elements of each one of the plurality of resolution levels which are higher than the certain one of the plurality of resolution levels, calculating an aggregated deduplication ratio as a total size of input chunks of data received over a certain time period covered by matches with repository data out of the total size of the input chunks relative to a total size of output data subsequent to deduplicating all of the input chunks of data received over the certain time period; and using the aggregated deduplication ratio to determine the resolution level of the input similarity elements by decreasing the similarity elements resolution level if the aggregated deduplication ratio is not lower than a predefined threshold and an average size of calculated sets of similarity element matches is not lower than two and a current resolution level is higher than the lowest resolution level. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer program product for adaptive similarity search using compatibility and inclusion of similarity element resolutions in a data deduplication system using a processor device in a computing environment, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising:
-
an executable portion that configures a plurality of resolution levels for a similarity search; an executable portion that calculates input similarity elements in a certain one of the plurality of resolution levels for a chunk of input data, wherein each resolution level is determined by identifying a number of the input similarity elements calculated in relation to a size of the chunk of input data; an executable portion that uses the input similarity elements of the certain one of the plurality of resolution levels to find similar data in a repository of data where similarity elements of the stored similar repository data are of the plurality of resolution levels, for locating and reducing duplicate data within the data deduplication system; an executable portion that defines the plurality of resolution levels to be between a highest resolution level and a lowest resolution level; an executable portion that configures the similarity elements of each one of the plurality of resolution levels which are lower than the certain one of the plurality of resolution levels to be a subset of the similarity elements of each one of the plurality of resolution levels which are higher than the certain one of the plurality of resolution levels; an executable portion that calculates an aggregated deduplication ratio as a total size of input chunks of data received over a certain time period covered by matches with repository data out of the total size of the input chunks relative to a total size of output data subsequent to deduplicating all of the input chunks of data received over the certain time period; and an executable portion that uses the aggregated deduplication ratio to determine the resolution level of the input similarity elements by decreasing the similarity elements resolution level if the aggregated deduplication ratio is not lower than a predefined threshold and an average size of calculated sets of similarity element matches is not lower than two and a current resolution level is higher than the lowest resolution level. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification