GLOBAL IN-LINE EXTENT-BASED DEDUPLICATION
First Claim
1. A method comprising:
- receiving a first write request directed towards a first logical unit (LUN), the first write request having write data and having metadata that includes a first logical block address (LBA), the first write request processed at a node of a storage system, the storage system attached to a storage array of solid state drives (SSDs);
applying a hash function to the write data to generate a first hash value;
selecting an extent store from a plurality of extent stores based on the first hash value;
storing the write data in a segment of the selected extent store, wherein the selected extent store spans a set of SSDs of the storage array, wherein a key is formed from the first hash value, wherein the key is stored in an extent store metadata structure;
receiving a second write request directed towards a second LUN, the second write request having the write data and having metadata that includes a second LBA;
applying the hash function to the write data to generate a second hash value, wherein the first hash value is equal to the second hash value; and
in response to the second write request, and based on the first hash value being equal to the second hash value, returning an acknowledgment that the write data is stored without writing the write data again to the storage array.
0 Assignments
0 Petitions
Accused Products
Abstract
In one embodiment, a layered file system of a storage input/output (I/O) stack executes on one or more nodes of a cluster. The layered file system includes a flash-optimized, log-structured layer configured to provide sequential storage of data and metadata (i.e., a log-structured layout) on solid state drives (SSDs) of storage arrays in the cluster to reduce write amplification, while leveraging a data de-duplication feature of the storage I/O stack. An extent store layer of the file system performs and maintains mappings of the extent keys to SSD storage locations, while a volume layer of the file system performs and maintains mappings of the LUN offset ranges to the extent keys. Separation of the mapping functions between the volume and extent store layers enables different volumes with different offset ranges to reference a same extent key (and thus a same extent).
-
Citations
20 Claims
-
1. A method comprising:
-
receiving a first write request directed towards a first logical unit (LUN), the first write request having write data and having metadata that includes a first logical block address (LBA), the first write request processed at a node of a storage system, the storage system attached to a storage array of solid state drives (SSDs); applying a hash function to the write data to generate a first hash value; selecting an extent store from a plurality of extent stores based on the first hash value; storing the write data in a segment of the selected extent store, wherein the selected extent store spans a set of SSDs of the storage array, wherein a key is formed from the first hash value, wherein the key is stored in an extent store metadata structure; receiving a second write request directed towards a second LUN, the second write request having the write data and having metadata that includes a second LBA; applying the hash function to the write data to generate a second hash value, wherein the first hash value is equal to the second hash value; and in response to the second write request, and based on the first hash value being equal to the second hash value, returning an acknowledgment that the write data is stored without writing the write data again to the storage array. - View Dependent Claims (2, 3, 4, 5, 7, 8, 9, 10)
-
-
6. (canceled)
-
11. A method comprising:
-
receiving a first write request directed towards a first logical unit (LUN), the first write request having write data and having metadata that includes a first logical block address (LBA), the first write request processed at a node of a storage system, the storage system attached to a storage array of solid state drives (SSDs); applying a hash function to the write data to generate a first hash value; selecting an extent store from a plurality of extent stores based on the first hash value; storing the write data in the selected extent store, wherein the selected extent store spans a set of SSDs of the storage array, wherein a key is formed from the first hash value; translating the first LBA to a first offset; selecting a first volume metadata structure associated with the first LUN; storing the key in the first volume metadata structure such that the key is associated with the first offset; receiving a second write request directed towards a second LUN, the second write request having the write data and having metadata that includes a second LBA; applying the hash function to the write data to generate a second hash value, wherein the second hash value is identical to the first hash value; translating the second LBA to a second offset; selecting a second volume metadata structure associated with the second LUN; and storing the key associated with the write data in the second volume metadata structure such that the key is associated with the second offset, thereby de-duplicating storage of the write data based on the first hash value being equal to the second hash value.
-
-
12. A system comprising:
-
a storage system having a memory connected to a processor via bus; a storage array coupled to the storage system and having one or more solid state drives (SSDs); a storage I/O stack executing on the processor of the storage system, the storage I/O stack when executed operable to; receive a first write request directed towards a logical unit (LUN), the first write request having write data and having metadata that includes a logical block address (LBA); apply a hash function to the write data to generate a first hash value; select an extent store from a plurality of extent stores based on the first hash value; store the write data in a segment of the selected extent store, wherein a key is formed from the first hash value, wherein the key is stored in an extent store metadata structure, wherein the segment spans a set of SSDs of the storage array; receive a second write request directed towards a second LUN, the second write request having the write data and having metadata that includes a second LBA; apply the hash function to the write data to generate a second hash value, wherein the first hash value is equal to the second hash value; and in response to the second write request, and based on the first hash value being equal to the second hash value, return an acknowledgment that the write data is stored without writing the write data again to the storage array. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification