Fast and Low-RAM-Footprint Indexing for Data Deduplication
First Claim
Patent Images
1. In a computing environment, a system comprising:
- a log-structured hash index maintained in a secondary storage device, in which entries of the log-structured hash index comprise hash values of data chunks and associated metadata with each hash value, the hash index updated by appending new entries to the log-structured hash index; and
a hash index service configured to access the log-structured hash index to perform a lookup based on a hash value computed for a chunk and return the metadata associated with that hash value if found, or to return a not-found result if not found.
4 Assignments
0 Petitions
Accused Products
Abstract
The subject disclosure is directed towards a data deduplication technology in which a hash index service'"'"'s index maintains a hash index in a secondary storage device such as a hard drive, along with a compact index table and look-ahead cache in RAM that operate to reduce the I/O to access the secondary storage device during deduplication operations. Also described is a session cache for maintaining data during a deduplication session, and encoding of a read-only compact index table for efficiency.
-
Citations
20 Claims
-
1. In a computing environment, a system comprising:
-
a log-structured hash index maintained in a secondary storage device, in which entries of the log-structured hash index comprise hash values of data chunks and associated metadata with each hash value, the hash index updated by appending new entries to the log-structured hash index; and a hash index service configured to access the log-structured hash index to perform a lookup based on a hash value computed for a chunk and return the metadata associated with that hash value if found, or to return a not-found result if not found. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. In a computing environment, method performed at least in part on at least one processor, comprising:
-
maintaining a hash index in a secondary storage device, in which entries of the hash index include hash values, each hash value being computed from a deduplicated data chunk, and being associated with metadata by which the deduplicated data chunk is locatable; maintaining a compact index table in a primary storage device that includes compact signatures representative of the hash values in the hash index, and for each compact signature, a pointer to a location of the corresponding hash value in the hash index; mapping each hash value to up to two or more entries in the compact index table, in which each entry contains a unique signature; and accessing the compact index table to lookup a compact signature corresponding to a requested hash value provided in a request, and returning a not-found result in response to the request if none of the compact signatures are found in the compact index table, or following one or more pointers to determine whether an entry in the hash index contains the requested hash value if the compact signature is found in the compact index table. - View Dependent Claims (13, 14, 15)
-
-
16. One or more computer-readable media having computer-executable instructions, which when executed perform steps, comprising,
maintaining a hash index in a secondary storage device, in which entries of the hash index include hash values, each hash value being computed from a deduplicated data chunk, and being associated with metadata by which the deduplicated data chunk is locatable; -
maintaining a look-ahead cache in a primary storage device that includes hash values and metadata entries cached from the index table; and accessing the look-ahead cache to lookup a requested hash value provided in a request, and returning metadata in response to the request if the requested hash value is found in the look-ahead cache. - View Dependent Claims (17, 18, 19, 20)
-
Specification