METHOD FOR REMOVING DUPLICATE DATA FROM A STORAGE ARRAY
First Claim
1. A computer system comprising:
- a data storage medium;
a first deduplication table comprising a first plurality of entries and a second deduplication table comprising a second plurality of entries, wherein each entry of the first and the second plurality of entries includes a hash corresponding to a data component; and
a data storage controller configured to;
store at least one entry in the first deduplication table rather than the second deduplication table based at least in part on a prediction that the at least one entry has a likelihood of being deduplicated that exceeds a given threshold;
search the first deduplication table based on a first hash corresponding to a storage access request prior to any search of the second deduplication table with the first hash;
initiate additional deduplication processing steps, in response to detecting a hit in the first deduplication table during the search; and
forego said additional deduplication processing steps, in response to detecting a miss in the first deduplication table during the search.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for efficiently removing duplicate data blocks at a fine-granularity from a storage array. A data storage subsystem supports multiple deduplication tables. Table entries in one deduplication table have the highest associated probability of being deduplicated. Table entries may move from one deduplication table to another as the probabilities change. Additionally, a table entry may be evicted from all deduplication tables if a corresponding estimated probability falls below a given threshold. The probabilities are based on attributes associated with a data component and attributes associated with a virtual address corresponding to a received storage access request. A strategy for searches of the multiple deduplication tables may also be determined by the attributes associated with a given storage access request.
-
Citations
21 Claims
-
1. A computer system comprising:
-
a data storage medium; a first deduplication table comprising a first plurality of entries and a second deduplication table comprising a second plurality of entries, wherein each entry of the first and the second plurality of entries includes a hash corresponding to a data component; and a data storage controller configured to; store at least one entry in the first deduplication table rather than the second deduplication table based at least in part on a prediction that the at least one entry has a likelihood of being deduplicated that exceeds a given threshold; search the first deduplication table based on a first hash corresponding to a storage access request prior to any search of the second deduplication table with the first hash; initiate additional deduplication processing steps, in response to detecting a hit in the first deduplication table during the search; and forego said additional deduplication processing steps, in response to detecting a miss in the first deduplication table during the search. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
maintaining a first deduplication table comprising a first plurality of entries and a second deduplication table comprising a second plurality of entries in a computer system, wherein each entry of the first and the second plurality of entries includes a hash corresponding to a data component; storing at least one entry in the first deduplication table rather than the second deduplication table based at least in part on a prediction that the at least one entry has a likelihood of being deduplicated that exceeds a given threshold; searching the first deduplication table based on a first hash corresponding to a storage access request prior to any search of the second deduplication table with the first hash; initiating additional deduplication processing steps, in response to detecting a hit in the first deduplication table during the search; and foregoing said additional deduplication processing steps, in response to detecting a miss in the first deduplication table during the search. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer readable storage medium comprising program instructions, wherein said program instructions are executable to:
-
maintain a first deduplication table comprising a first plurality of entries and a second deduplication table comprising a second plurality of entries in a computer system, wherein each entry of the first and the second plurality of entries includes a hash corresponding to a data component; store at least one entry in the first deduplication table rather than the second deduplication table based at least in part on a prediction that the at least one entry has a likelihood of being deduplicated that exceeds a given threshold; search the first deduplication table based on a first hash corresponding to a storage access request prior to any search of the second deduplication table with the first hash; initiate additional deduplication processing steps, in response to detecting a hit in the first deduplication table during the search; and forego said additional deduplication processing steps, in response to detecting a miss in the first deduplication table during the search. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
Specification