De-duplication in a virtualized storage environment
First Claim
1. A method for de-duplicating redundant data in a virtualized storage environment, the method comprising:
- pooling storage capacity from a plurality of storage devices into a single storage pool by applying an abstraction layer to the plurality of storage devices, the abstraction layer presenting a representation of a corresponding portion of the pooled storage capacity to each of one or more host systems;
operating the one or more host systems in a computer architecture that includes the plurality of storage devices, each host system configured to write data to and read data from a corresponding portion of the pooled storage capacity; and
operating a data de-duplication application on a host system included in the one or more host systems in the computer architecture to globally de-duplicate data in the pooled storage capacity, wherein the abstraction layer presents a representation at least a portion of the pooled storage capacity to the data de-duplication application and wherein the data de-duplication application performs de-duplication in the portion of the pooled storage capacity.
9 Assignments
0 Petitions
Accused Products
Abstract
A data de-duplication application de-duplicates redundant data in the pooled storage capacity of a virtualized storage environment. The virtualized storage environment includes a plurality of storage devices and a virtualization or abstraction layer that aggregates all or a portion of the storage capacity of each storage device into a single pool of storage capacity, all or portions of which can be allocated to one or more host systems. For each host system, the virtualization layer presents a representation of at least a portion of the pooled storage capacity wherein the corresponding host system can read and write data. The data de-duplication application identifies redundant data in the pooled storage capacity and replaces it with one or more pointers pointing to a single instance of the data. The de-duplication application can operate on fixed or variable size blocks of data and can de-duplicate data either post-process or in-line.
34 Citations
19 Claims
-
1. A method for de-duplicating redundant data in a virtualized storage environment, the method comprising:
-
pooling storage capacity from a plurality of storage devices into a single storage pool by applying an abstraction layer to the plurality of storage devices, the abstraction layer presenting a representation of a corresponding portion of the pooled storage capacity to each of one or more host systems; operating the one or more host systems in a computer architecture that includes the plurality of storage devices, each host system configured to write data to and read data from a corresponding portion of the pooled storage capacity; and operating a data de-duplication application on a host system included in the one or more host systems in the computer architecture to globally de-duplicate data in the pooled storage capacity, wherein the abstraction layer presents a representation at least a portion of the pooled storage capacity to the data de-duplication application and wherein the data de-duplication application performs de-duplication in the portion of the pooled storage capacity. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method for providing storage virtualization and data de-duplication in a computer environment including a plurality of storage devices, the method comprising:
-
applying a virtualization layer to a plurality of storage devices to aggregate their storage capacity such that during operation, when a write request comprising a virtual memory address and write data is received, the virtualization layer maps the virtual memory address to a physical memory address within the aggregated storage capacity, wherein the aggregated storage capacity includes previously stored data including a first data block, wherein the aggregated storage capacity is shared by a plurality of hosts; determining that a second block of data included in the write data is identical to the first block of data using a de-duplication application operating on a host system in the computer environment, wherein the virtualization layer presents a representation of the aggregated storage capacity to the de-duplication application operating on the host system; and storing a pointer in the aggregated storage capacity at the physical memory address instead of storing the second block of data, the pointer pointing to the first block of data. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A method of pooling storage capacity from a plurality of storage devices and de-duplicating data within the pooled storage capacity, the method comprising:
-
applying a virtualization layer to a plurality of storage devices to pool storage capacity from the plurality of storage devices, wherein original data is stored in the pooled storage capacity and wherein the pooled storage capacity is shared; receiving a write request from a first host system, the write request including at least a virtual storage address and write data; mapping the virtual storage address to a physical storage address within a portion of the pooled storage capacity allocated to the host system; comparing the write data to the original data to identify a first data block within the original data that is identical to a second data block within the write data with a data de-duplication application operation on a second host system, wherein the virtualization layer presents a representation of the pooled storage capacity to the second host system; and replacing one of the first data block and the second data block with a pointer that points to a remaining one of the first data block and the second data block in the pooled storage capacity. - View Dependent Claims (16, 17, 18, 19)
-
Specification