Method for removing duplicate data from a storage array
First Claim
1. A computer system comprising:
- a non-transitory data storage medium;
a first fingerprint table comprising a first plurality of entries and a second fingerprint table comprising a second plurality of entries, wherein each entry of the first and the second plurality of entries is configured to store a fingerprint corresponding to a data component already stored in the system, wherein the first fingerprint table has fewer entries than the second fingerprint table and wherein the second fingerprint table comprises a fingerprint for at least one deduplicated data components not included in the first fingerprint table; and
a data storage controller comprising hardware;
wherein in response to receiving a write request, the data storage controller is configured to;
search the first fingerprint table during inline deduplication prior to the second fingerprint table based on a first fingerprint corresponding to the write request;
in response to detecting a hit on a matching entry in the first fingerprint table during said search;
write a reference to the data corresponding to the matching entry in the first table; and
in response to detecting a miss in the first fingerprint table during said search;
postpone further deduplication to offline deduplication; and
write data corresponding to the write request in the data storage medium.
1 Assignment
0 Petitions
Accused Products
Abstract
A system and method for efficiently removing duplicate data blocks at a fine-granularity from a storage array. A data storage subsystem supports multiple deduplication tables. Table entries in one deduplication table have the highest associated probability of being deduplicated. Table entries may move from one deduplication table to another as the probabilities change. Additionally, a table entry may be evicted from all deduplication tables if a corresponding estimated probability falls below a given threshold. The probabilities are based on attributes associated with a data component and attributes associated with a virtual address corresponding to a received storage access request. A strategy for searches of the multiple deduplication tables may also be determined by the attributes associated with a given storage access request.
-
Citations
19 Claims
-
1. A computer system comprising:
-
a non-transitory data storage medium; a first fingerprint table comprising a first plurality of entries and a second fingerprint table comprising a second plurality of entries, wherein each entry of the first and the second plurality of entries is configured to store a fingerprint corresponding to a data component already stored in the system, wherein the first fingerprint table has fewer entries than the second fingerprint table and wherein the second fingerprint table comprises a fingerprint for at least one deduplicated data components not included in the first fingerprint table; and a data storage controller comprising hardware; wherein in response to receiving a write request, the data storage controller is configured to; search the first fingerprint table during inline deduplication prior to the second fingerprint table based on a first fingerprint corresponding to the write request; in response to detecting a hit on a matching entry in the first fingerprint table during said search; write a reference to the data corresponding to the matching entry in the first table; and in response to detecting a miss in the first fingerprint table during said search; postpone further deduplication to offline deduplication; and write data corresponding to the write request in the data storage medium. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method comprising:
-
maintaining a first fingerprint table comprising a first plurality of entries and a second fingerprint table comprising a second plurality of entries in a computer system, wherein each entry of the first and the second plurality of entries is configured to store a fingerprint corresponding to a data component already stored in the system, wherein the first fingerprint table has fewer entries that the second fingerprint table and wherein the second fingerprint table comprises a fingerprint for at least one deduplicated data components not included in the first fingerprint table; in response to receiving a storage access request; searching the first fingerprint table during inline deduplication prior to the second fingerprint table based on a first fingerprint corresponding to the write request; in response to detecting a hit on a matching entry in the first fingerprint table during said search; write a reference to the data corresponding to the matching entry in the first table; and in response to detecting a miss in the first fingerprint table during said search; postpone further deduplication to offline deduplication; and write data corresponding to the write request in the data storage medium. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A non-transitory computer readable storage medium comprising program instructions, wherein said program instructions are executable to:
-
maintain a first fingerprint table comprising a first plurality of entries and a second fingerprint table comprising a second plurality of entries in a computer system, wherein each entry of the first and the second plurality of entries is configured to store a fingerprint corresponding to a data component already stored in the system, wherein the first fingerprint table has fewer entries than the second fingerprint table and wherein the second fingerprint table comprises a fingerprint for at least one deduplicated data components not included in the first fingerprint table; in response to receiving a storage access request; search the first fingerprint table prior to the second fingerprint table based on a first fingerprint corresponding to the write request; in response to detecting a hit on a matching entry in the first fingerprint table during said search; write a reference to the data corresponding to the matching entry in the first table; and in response to detecting a miss in the first fingerprint table during said search; postpone further deduplication to offline deduplication; and write data corresponding to the write request in the data storage medium. - View Dependent Claims (15, 16, 17, 18, 19)
-
Specification