Reducing digest storage consumption in a data deduplication system
First Claim
Patent Images
1. A method for reducing digests storage consumption in a data deduplication system using a processor device in a computing environment, comprising:
- calculating digest values for input data by using a single linear scan of rolling hash values for producing both similarity search values and boundaries of digest blocks;
using the digest values to locate matches with data stored in a repository;
storing the digest values in the repository;
removing the digest values of the data stored in the repository that is determined to be redundant with the input data;
storing the digest values in the repository linearly in a sequence of occurrence of the digest values in the data; and
storing the digest values in the repository in a form that is independent of the form by which the data that the digest values describe is stored.
1 Assignment
0 Petitions
Accused Products
Abstract
For reducing digests storage consumption in a data deduplication system using a processor device in a computing environment, digest values are calculated for input data. The digest values are used to locate matches with data stored in a repository. The digest values are stored in the repository. The digest values of the data stored in the repository that is determined to be redundant with the input data are removed.
40 Citations
18 Claims
-
1. A method for reducing digests storage consumption in a data deduplication system using a processor device in a computing environment, comprising:
-
calculating digest values for input data by using a single linear scan of rolling hash values for producing both similarity search values and boundaries of digest blocks; using the digest values to locate matches with data stored in a repository; storing the digest values in the repository; removing the digest values of the data stored in the repository that is determined to be redundant with the input data; storing the digest values in the repository linearly in a sequence of occurrence of the digest values in the data; and storing the digest values in the repository in a form that is independent of the form by which the data that the digest values describe is stored. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system for reducing digests storage consumption in a data deduplication system of a computing environment, the system comprising:
-
the data deduplication system; a repository operating in the data deduplication system; at least one processor device operable in the computing storage environment for controlling the data deduplication system, wherein the at least one processor device; calculates digest values for input data by using a single linear scan of rolling hash values for producing both similarity search values and boundaries of digest blocks, uses the digest values to locate matches with data stored in a repository, stores the digest values in the repository, removes the digest values of the data stored in the repository that is determined to be redundant with the input data, stores the digest values in the repository linearly in a sequence of occurrence of the digest values in the data, and stores the digest values in the repository in a form that is independent of the form by which the data that the digest values describe is stored. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer program product for reducing digests storage consumption in a data deduplication system using a processor device in a computing environment, the computer program product comprising a computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising:
-
a first executable portion that calculates digest values for input data by using a single linear scan of rolling hash values for producing both similarity search values and boundaries of digest blocks; a second executable portion that uses the digest values to locate matches with data stored in a repository; a third executable portion that stores the digest values in the repository; a fourth executable portion that removes the digest values of the data stored in the repository that is determined to be redundant with the input data; a fifth executable portion that stores the digest values in the repository linearly in a sequence of occurrence of the digest values in the data; and a sixth executable portion that stores the digest values in the repository in a form that is independent of the form by which the data that the digest values describe is stored. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification