Global in-line extent-based deduplication
First Claim
1. A method comprising:
- receiving a first write request directed towards a first logical unit (LUN), the first write request having write data and having metadata that includes a first logical block address (LBA), the first write request processed at a node of a storage system, the storage system attached to a storage array of solid state drives (SSDs);
applying a hash function to the write data to generate a first hash value;
selecting an extent store from a plurality of extent stores based on the first hash value;
storing the write data in a segment of the selected extent store, wherein the selected extent store spans a set of SSDs of the storage array, wherein a key is formed from the first hash value, storing the key in a first volume metadata structure associated with the first LUN;
receiving a second write request directed towards a second LUN, the second write request having the write data and having metadata that includes a second LBA;
applying the hash function to the write data to generate a second hash value, wherein the first hash value is equal to the second hash value; and
storing the key associated with the write data in a second volume metadata structure associated with the second LUN, without writing the write data again to the storage array, to de-duplicate storage of the write data.
0 Assignments
0 Petitions
Accused Products
Abstract
In one embodiment, a layered file system of a storage input/output (I/O) stack executes on one or more nodes of a cluster. The layered file system includes a flash-optimized, log-structured layer configured to provide sequential storage of data and metadata (i.e., a log-structured layout) on solid state drives (SSDs) of storage arrays in the cluster to reduce write amplification, while leveraging a data de-duplication feature of the storage I/O stack. An extent store layer of the file system performs and maintains mappings of the extent keys to SSD storage locations, while a volume layer of the file system performs and maintains mappings of the LUN offset ranges to the extent keys. Separation of the mapping functions between the volume and extent store layers enables different volumes with different offset ranges to reference a same extent key (and thus a same extent).
67 Citations
19 Claims
-
1. A method comprising:
-
receiving a first write request directed towards a first logical unit (LUN), the first write request having write data and having metadata that includes a first logical block address (LBA), the first write request processed at a node of a storage system, the storage system attached to a storage array of solid state drives (SSDs); applying a hash function to the write data to generate a first hash value; selecting an extent store from a plurality of extent stores based on the first hash value; storing the write data in a segment of the selected extent store, wherein the selected extent store spans a set of SSDs of the storage array, wherein a key is formed from the first hash value, storing the key in a first volume metadata structure associated with the first LUN; receiving a second write request directed towards a second LUN, the second write request having the write data and having metadata that includes a second LBA; applying the hash function to the write data to generate a second hash value, wherein the first hash value is equal to the second hash value; and storing the key associated with the write data in a second volume metadata structure associated with the second LUN, without writing the write data again to the storage array, to de-duplicate storage of the write data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method comprising:
-
receiving a first write request directed towards a first logical unit (LUN), the first write request having write data and having metadata that includes a first logical block address (LBA), the first write request processed at a node of a storage system, the storage system attached to a storage array of solid state drives (SSDs); applying a hash function to the write data to generate a first hash value; selecting an extent store from a plurality of extent stores based on the first hash value; storing the write data in the selected extent store, wherein the selected extent store spans a set of SSDs of the storage array, wherein a key is formed from the first hash value; translating the first LBA to a first offset; selecting a first volume metadata structure associated with the first LUN; storing the key in the first volume metadata structure such that the key is associated with the first offset; receiving a second write request directed towards a second LUN, the second write request having the write data and having metadata that includes a second LBA; applying the hash function to the write data to generate a second hash value, wherein the second hash value is identical to the first hash value; translating the second LBA to a second offset; selecting a second volume metadata structure associated with the second LUN; and storing the key associated with the write data in the second volume metadata structure such that the key is associated with the second offset, thereby de-duplicating storage of the write data based on the first hash value being equal to the second hash value.
-
-
11. A system comprising:
-
a storage system having a memory connected to a processor via bus; a storage array coupled to the storage system and having one or more solid state drives (SSDs); a storage I/O stack executing on the processor of the storage system, the storage I/O stack when executed operable to; receive a first write request directed towards a first logical unit (LUN), the first write request having write data and having metadata that includes a logical block address (LBA); apply a hash function to the write data to generate a first hash value; select an extent store from a plurality of extent stores based on the first hash value; store the write data in a segment of the selected extent store, wherein a key is formed from the first hash value, wherein the segment spans a set of SSDs of the storage array; store the key in a first volume metadata structure associated with the first LUN; receive a second write request directed towards a second LUN, the second write request having the write data and having metadata that includes a second LBA; apply the hash function to the write data to generate a second hash value, wherein the first hash value is equal to the second hash value; and store the key associated with the write data in a second volume metadata structure associated with the second LUN, without writing the write data again to the storage array, to de-duplicate storage of the write data. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
Specification