Content addressable storage array element
First Claim
1. A method for managing storage resources of a storage system, the method comprising:
- performing, on a remote storage array at the logical unit level, a content addressable storage computation to compute a key from content of a first data block in response to receiving a client request to write the first data block to the storage system;
comparing, on the remote storage array, the computed key with keys of entries in a mapping table to determine if there is a match;
in response to determining there is a match,comparing, on the remote storage array, the content of the first data block with content of a second data block previously stored on the resources of the remote storage array; and
in response to determining that the comparison of the data block contents results in a match,incrementing a reference count on the previously stored data block,cooperating with a file system executing on the storage system to provide the storage system with a physical block number of the second data block to the storage system rather than storing duplicate data block contents on the storage resources of the remote storage array; and
wherein the remote storage array operates in parallel with one or more additional remote storage arrays to allow aggregation of resources among the remote storage arrays, the parallel operation performing content addressable storage computations associated with the write operations on each of the one or more remote storage arrays.
1 Assignment
0 Petitions
Accused Products
Abstract
A content addressable storage array element (CASAE) of a storage system is configured to eliminate duplicate data stored on its storage resources. The CASAE independently determines whether data associated with a write operation has already been written to a location on its storage resources. To that end, the CASAE performs a content addressable storage computation on each data block written to those resources in order to prevent storage of two or more blocks with the same data. If data of a block has been previously stored on the resources, the CASAE cooperates with a file system executing on the system to provide a reference (block pointer) to the same data block rather than duplicate the stored data. Otherwise, the CASAE stores the data block at a new location on the resources and provides a block pointer to that location.
178 Citations
23 Claims
-
1. A method for managing storage resources of a storage system, the method comprising:
-
performing, on a remote storage array at the logical unit level, a content addressable storage computation to compute a key from content of a first data block in response to receiving a client request to write the first data block to the storage system; comparing, on the remote storage array, the computed key with keys of entries in a mapping table to determine if there is a match; in response to determining there is a match, comparing, on the remote storage array, the content of the first data block with content of a second data block previously stored on the resources of the remote storage array; and in response to determining that the comparison of the data block contents results in a match, incrementing a reference count on the previously stored data block, cooperating with a file system executing on the storage system to provide the storage system with a physical block number of the second data block to the storage system rather than storing duplicate data block contents on the storage resources of the remote storage array; and wherein the remote storage array operates in parallel with one or more additional remote storage arrays to allow aggregation of resources among the remote storage arrays, the parallel operation performing content addressable storage computations associated with the write operations on each of the one or more remote storage arrays. - View Dependent Claims (2, 3)
-
-
4. A computer system configured for managing storage resources of a storage system, comprising:
a remote content addressable storage array element configured to compute, on a remote storage array at the logical unit level by a processor, a key based on content of a first data block in response to receiving a client request to write the first data block to the storage device, determine whether the key has been generated for a second data block previously stored on the resources, if so compare, on the remote storage array, the data block contents of the remote storage array, if there is a match increment a reference count on the previously stored data block and cooperate with a file system executing on the storage system to provide the storage system with a physical block number of the second data block to the storage system rather than writing the first data block to the storage device; and wherein the remote storage array operates in parallel with one or more additional remote storage arrays to allow aggregation of resources among the remote storage arrays, the parallel operation performing content addressable storage computations associated with the write operations on each of the one or more remote storage arrays. - View Dependent Claims (5, 6, 7, 8, 9, 10, 11)
-
12. An apparatus configured to manage storage resources of a storage system, the apparatus comprising:
-
performing, by a processor, at the logical unit level, a content addressable storage computation to compute a key from content of a first data block in response to receiving a client request to write the first data block to the storage system; means for comparing the computed key with keys of entries in a mapping table to determine if there is a match; in response to determining that there is a match, means for comparing the content of the first data block with content of a second data block previously stored on the resources of a remote storage array; in response to determining the comparison of the data block contents results in a match, means for incrementing a reference count on the previously stored data block, and means for cooperating with a file system executing on the storage system to provide the storage system with a physical block number of the second data block to the storage system rather than storing duplicate data block contents on the storage resources of the remote storage array; and wherein the remote storage array operates in parallel with one or more additional remote storage arrays to allow aggregation of resources among the remote storage arrays, the parallel operation performing content addressable storage computations associated with the write operations on each of the one or more remote storage arrays. - View Dependent Claims (13, 14)
-
-
15. A computer readable medium containing executable program instructions executed by a processor, comprising:
-
program instructions that perform, on a remote storage array at the logical unit level, a content addressable storage computation to compute a key from content of a first data block in response to receiving a client request to write the first data block to the storage system; program instructions that compare, on the remote storage array, the computed key with keys of entries in a mapping table to determine if there is a match; program instructions that compare, on a remote storage array, the content of the first data block with content of a second data block previously stored on the resources in response to determining that there is a match, comparing; program instructions that, in response to determining that the comparison of the data block contents results in a match, increment a reference count on the previously stored data block and cooperate with a file system executing on the storage system to provide the storage system with a physical block number of the second data block to the storage system rather than storing duplicate data block contents on the storage resources of the remote storage array; and wherein the remote storage array operates in parallel with one or more additional remote storage arrays to allow aggregation of resources among the remote storage arrays, the parallel operation performing content addressable storage computations associated with the write operations on each of the one or more remote storage arrays. - View Dependent Claims (16, 17)
-
-
18. A method for managing storage resources of a storage system, the method comprising:
-
receiving from a storage system a write request at a content addressable storage array element (CASAE), the CASAE coupled to a plurality of disks of a remote storage array, the remote storage array configured to store user data of a data container served by the storage system; performing, on the remote storage array at the logical unit level, a content addressable storage computation, the computation resulting in a key computed from content of a first data block; comparing, on the remote storage array, the computed key with a plurality of previously generated keys to determine if there is a match, the previously generated keys associated with previously stored data blocks; in response to determining that there is a match, comparing the content of the first data block with content of a second data block previously stored on the remote storage array and in response to determining that the comparison of the data block contents results in a match, incrementing a reference count on the previously stored data block, and cooperating with a file system executing on the storage system to provide the storage system with a physical block number of the second data block to the storage system rather than storing duplicate data block contents on the remote storage array; and wherein the remote storage array operates in parallel with one or more additional remote storage arrays to allow aggregation of resources among the remote storage arrays, the parallel operation performing content addressable storage computations associated with the write operations on each of the one or more remote storage arrays. - View Dependent Claims (19, 20)
-
-
21. A method for managing a storage system, comprising:
-
receiving a write request to write a first data block to a remote storage array; computing, on the remote storage array at the logical unit level, a hash key of the first data block; comparing, on the remote storage array, the hash key of the first data block with previously computed hash keys of stored data blocks, the stored data blocks stored in the remote storage array; in the event that the hash key of the first data block does not match any of the previously computed hash keys, storing the first data block to the remote storage array; in the event that the hash key of the first data block does match a previously computed hash key; comparing, on the remote storage array, the first data block with one or more stored data blocks associated with the previously computed hash key; in the event that the first data block matches one of the one or more data blocks associated with the previously computed hash key, cooperating with a file system executing on the storage system to provide the storage system with a physical block number of a stored data block associated with the previously computed hash key to the storage system; updating a pointer to a location of the stored data block; in the event that the first data block does not match the one or more stored data blocks associated with the previously computed hash key, storing the first data block to the storage array; and wherein the remote storage array operates in parallel with one or more additional remote storage arrays to allow aggregation of resources among the remote storage arrays, the parallel operation performing content addressable storage computations associated with the write operations on each of the one or more remote storage arrays.
-
-
22. A system for managing a storage system, comprising:
-
a write request to write a first data block to a remote storage array; a processor on a content addressable storage array element, to compute, at the logical unit level, a hash key of the first data block; the processor to compare the hash key of the first data block with previously computed hash keys of stored data blocks, the stored data blocks stored in the remote storage array; in the event that the hash key of the first data block does not match any of the previously computed hash keys, the processor to store the first data block to the remote storage array; in the event that the hash key of the first data block does match a previously computed hash key; the processor to compare, on the remote storage array, the first data block with one or more stored data blocks associated with the previously computed hash key; in the event that the first data block matches one of the one or more data blocks associated with the previously computed hash key, the processor to cooperate with a file system executing on the storage system to provide the storage system with a physical block number of a stored data block associated with the previously computed hash key to the storage system; the processor to update a pointer to a location of the stored data block; in the event that the first data block does not match the one or more stored data blocks associated with the previously computed hash key, the processor to store the first data block to the remote storage array; and wherein the remote storage array operates in parallel with one or more additional remote storage arrays to allow aggregation of resources among the remote storage arrays, the parallel operation performing content addressable storage computations associated with the write operations on each of the one or more remote storage arrays.
-
-
23. A computer readable medium containing executable program instructions executed by a processor, comprising:
-
program instructions that receive a write request to write a first data block to a remote storage array; program instructions that compute, on the remote storage array at the logical unit level, a hash key of the first data block; program instructions that compare, on the remote storage array, the hash key of the first data block with previously computed hash keys of stored data blocks, the stored data blocks stored in the storage array; program instructions that, in the event that the hash key of the first data block does not match any of the previously computed hash keys, store the first data block to the remote storage array; program instructions that, in the event that the hash key of the first data block does match a previously computed hash key; compare, on the remote storage array, the first data block with one or more stored data blocks associated with the previously computed hash key; in the event that the first data block matches one of the one or more data blocks associated with the previously computed hash key, cooperate with a file system executing on the storage system to provide the storage system with a physical block number of a stored data block associated with the previously computed hash key to the storage system; update a pointer to a location of the stored data block; in the event that the first data block does not match the one or more stored data blocks associated with the previously computed hash key, store the first data block to the remote storage array; and wherein the remote storage array operates in parallel with one or more additional remote storage arrays to allow aggregation of resources among the remote storage arrays, the parallel operation performing content addressable storage computations associated with the write operations on each of the one or more remote storage arrays.
-
Specification