OPTIMIZING HASH TABLE STRUCTURE FOR DIGEST MATCHING IN A DATA DEDUPLICATION SYSTEM
First Claim
1. A method for optimizing a hash table structure for digest matching in a data deduplication system using a processor device in a computing environment, comprising:
- determining a repository data interval as similar to an input data interval;
loading a plurality of repository digests corresponding to the similar repository data interval into a sequential representation and into a search structure; and
incorporating into entries of the search structure a compact index pointing to a position in the sequential representation of a plurality of digests.
1 Assignment
0 Petitions
Accused Products
Abstract
Repository data intervals are determined as similar to an input data interval. Repository digests corresponding to the similar repository data interval are loaded into a sequential representation and into a search structure. Matches of input digests and the repository digests are found using the search structure. Each one of the found matches of the input digests and repository digests are extended using the sequential representation. Data matches are determined between the input data and the repository data using extended matches of digests. A compact index pointing to a position in the sequential representation of digests is incorporated into entries of the search structure.
-
Citations
24 Claims
-
1. A method for optimizing a hash table structure for digest matching in a data deduplication system using a processor device in a computing environment, comprising:
-
determining a repository data interval as similar to an input data interval; loading a plurality of repository digests corresponding to the similar repository data interval into a sequential representation and into a search structure; and incorporating into entries of the search structure a compact index pointing to a position in the sequential representation of a plurality of digests. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system for optimizing a hash table structure for digest matching in a data deduplication system of a computing environment, the system comprising:
-
the data deduplication system; the dual data structures in the data deduplication system, wherein the dual data structures include a search structure and a sequential buffer; a hash table included in the data deduplication system; a repository operating in the data deduplication system; and at least one processor device operable in the computing storage environment for controlling the data deduplication system, wherein the at least one processor device; determines a repository data interval as similar to an input data interval, loads a plurality of repository digests corresponding to the similar repository data interval into a sequential representation and into the search structure, and incorporates into entries of the search structure a compact index pointing to a position in the sequential representation of a plurality of digests. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer program product for optimizing a hash table structure for digest matching in a data deduplication system using a processor device in a computing environment, the computer program product comprising a computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising:
-
a first executable portion that determines a repository data interval as similar to an input data interval; a second executable portion that loads a plurality of repository digests corresponding to the similar repository data interval into a sequential representation and into a search structure; and a third executable portion that incorporates into entries of the search structure a compact index pointing to a position in the sequential representation of a plurality of digests. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification