System and method for application aware de-duplication of data blocks on a virtualized storage array
First Claim
1. A method for application aware de-duplication (de-dup) of data blocks on one or more virtualized storage arrays in a networked storage system, comprising:
- enabling a de-dup agent on each of a host device, a data path module (DPM), and one or more virtualized storage arrays, wherein the DPM is connected between the host device and the one or more virtualized storage arrays such that operations between the host device and the one or more virtualized storage arrays are communicated via the DPM;
creating a master list of metadata associated with indexed data and storing the masterlist in the one or more virtualized storage arrays;
creating one or more sublists of metadata from the masterlist and storing the one or more sublists in one or more of the host device and the DPM;
upon receiving a write request from an application residing in the host device, determining whether a data block being written has an entry in the sublist stored in the host device;
if the data block has an entry in a sublist stored in the host device, replacing the data block with a pointer indicating where the data block is residing in the one or more virtualized storage arrays;
if it is determined that the data block being written has no entry in the sublist stored in the host device, determining whether the data block being written is in the masterlist stored in the one or more virtualized storage arrays;
if it is determined that the data block being written is in the masterlist stored in the one or more virtualized storage arrays, replacing the data block with a pointer indicating where the data block is residing in the one or more virtualized storage arrays; and
if it is determined that the data block being written is not in the masterlist stored in the one or more virtualized storage arrays, writing the data block in one of the one or more virtualized storage arrays.
2 Assignments
0 Petitions
Accused Products
Abstract
A system and method for application aware de-duplication (de-dup) of data blocks in a virtualized storage array is disclosed. In one embodiment, in a method of application aware de-dup of data blocks on virtualized storage arrays in a storage area network, a de-dup agent is enabled on each of one or more components of the storage area network. A master list of metadata associated with indexed data is then created and stored in the virtualized storage arrays. One or more sublists of metadata are then created from the masterlist and are stored. Upon receiving a write request from an application residing in the host device, it is determined whether data block being written has an entry in a sublist stored in a host device, and if so, the data block is then replaced with a pointer indicating where the data block is residing in the virtualized storage arrays.
-
Citations
11 Claims
-
1. A method for application aware de-duplication (de-dup) of data blocks on one or more virtualized storage arrays in a networked storage system, comprising:
-
enabling a de-dup agent on each of a host device, a data path module (DPM), and one or more virtualized storage arrays, wherein the DPM is connected between the host device and the one or more virtualized storage arrays such that operations between the host device and the one or more virtualized storage arrays are communicated via the DPM; creating a master list of metadata associated with indexed data and storing the masterlist in the one or more virtualized storage arrays; creating one or more sublists of metadata from the masterlist and storing the one or more sublists in one or more of the host device and the DPM; upon receiving a write request from an application residing in the host device, determining whether a data block being written has an entry in the sublist stored in the host device; if the data block has an entry in a sublist stored in the host device, replacing the data block with a pointer indicating where the data block is residing in the one or more virtualized storage arrays; if it is determined that the data block being written has no entry in the sublist stored in the host device, determining whether the data block being written is in the masterlist stored in the one or more virtualized storage arrays; if it is determined that the data block being written is in the masterlist stored in the one or more virtualized storage arrays, replacing the data block with a pointer indicating where the data block is residing in the one or more virtualized storage arrays; and if it is determined that the data block being written is not in the masterlist stored in the one or more virtualized storage arrays, writing the data block in one of the one or more virtualized storage arrays. - View Dependent Claims (2, 3, 4)
-
-
5. A networked storage system, comprising:
-
a host device; a DPM connected to the host device; and one or more virtualized storage arrays connected to the DPM, wherein the DPM is configured such that operations between the host device and the one or more virtualized storage arrays are communicated via the DPM, wherein each of the host device, the DPM and the one or more virtualized storage arrays includes an associated de-dup agent to enable application aware de-dup of data blocks on the one or more virtualized storage arrays, wherein the one or more virtualized storage arrays include a masterlist of metadata associated with indexed data stored therein, wherein the host device includes a sublist of metadata from the masterlist stored therein, wherein the de-duo agent associated with the host device is operable to; determine whether a data block being written has an entry in a sublist of the one or more sublists, the sublist stored in the host device; and if the data block has an entry in the sublist stored in the host device, replace the data block with a pointer indicating where the data block is residing in the one or more virtualized storage arrays, and wherein the de-duo agent associated with the one or more virtualized storage arrays is operable to; if it is determined that the data block being written has no entry in the sublist stored in the host device, determining whether the data block being written is in the masterlist stored in the one or more virtualized storage arrays; if it is determined that the data block being written is in the masterlist stored in the one or more virtualized storage arrays, replacing the data block with a pointer indicating where the data block is residing in the one or more virtualized storage arrays; and if it is determined that the data block being written is not in the masterlist stored in the one or more virtualized storage arrays, writing the data block in one of the one or more virtualized storage arrays. - View Dependent Claims (6, 7)
-
-
8. A networked storage system, comprising:
-
a host device; and one or more virtualized storage arrays connected to the host device, wherein each of the host device, and the one or more virtualized storage arrays includes an associated de-dup agent to enable application aware de-dup of data blocks on the one or more virtualized storage arrays, wherein the de-dup agent associated with the one or more virtualized storage arrays creates a masterlist of metadata associated with indexed data and stores the masterlist in the one or more virtualized storage arrays, wherein the master list of metadata includes an ordered weightage list decided based on number of occurrences of the data blocks in the one or more virtualized storage arrays, wherein the de-dup agent associated with the host device; obtains sublists of metadata from the masterlist and stores the sublists in the host device; determines whether a data block being written has an entry in the sublist stored in the host device; and if the data block has an entry in the sublist stored in the host device, replaces the data block with a pointer indicating where the data block is residing in the virtualized storage arrays, and wherein the de-dup agent associated with the one or more virtualized storage arrays; if it is determined that the data block being written has no entry in the sublist stored in the host device, determining whether the data block being written is in the masterlist stored in the one or more virtualized storage arrays; if it is determined that the data block being written is in the masterlist stored in the one or more virtualized storage arrays, replacing the data block with a pointer indicating where the data block is residing in the one or more virtualized storage arrays; and if it is determined that the data block being written is not in the masterlist stored in the one or more virtualized storage arrays, writing the data block in one of the one or more virtualized storage arrays.
-
-
9. A non-transitory computer-readable storage medium for application aware de-dup of data blocks on virtualized storage arrays in a networked storage system, having instructions that, when executed by a computing device causes the computing device to:
-
enable a de-dup agent on each of a host device, a DPM, and virtualized storage arrays, wherein the DPM is connected between the host device and to the virtualized storage arrays such that operations between the host device and the virtualized storage arrays are communicated via the DPM; create a master list of metadata associated with indexed data and storing the masterlist in the virtualized storage arrays; create one or more sublists of metadata from the masterlist and storing the one or more sublists in one of the host device and the DPM; upon receiving a write request from an application residing in the host device, determines whether data block being written has an entry in a sublist stored in the host device; and if so, replaces the data block with a pointer indicating where the data block is residing in the virtualized storage arrays; if it is determined that the data block being written has no entry in the sublist stored in the host device, determining whether the data block being written is in the masterlist stored in the one or more virtualized storage arrays; if it is determined that the data block being written is in the masterlist stored in the one or more virtualized storage arrays, replacing the data block with a pointer indicating where the data block is residing in the one or more virtualized storage arrays; and if it is determined that the data block being written is not in the masterlist stored in the one or more virtualized storage arrays, writing the data block in one of the one or more virtualized storage arrays.
-
-
10. The non-transitory computer-readable storage medium 18, wherein the master list of metadata is includes an ordered weightage list decided based on number of occurrences of the data blocks in each of the virtualized storage arrays.
-
11. The non-transitory computer-readable storage medium 18, wherein the virtualized storage arrays comprise thin provisioned virtual volumes.
Specification