Logical block replication with deduplication
First Claim
1. A method comprising:
- storing at a source storage system a set of data that includes a plurality of data blocks, each of the data blocks being identified by a logical block pointer;
deduplicating the data blocks in the set of data at the source storage system so that at least one of the data blocks is represented by at least two different logical block pointers; and
replicating the data blocks in the set of data from the source storage system to a destination storage system at a logical block level corresponding to the logical block pointers, including sending the at least one data block represented by the at least two different logical block pointers from the source storage system to the destination storage system without transmitting any data block in the data set more than once from the source storage system for delivery to the destination storage system.
1 Assignment
0 Petitions
Accused Products
Abstract
Bandwidth consumption between a data replication source and destination and storage consumption at the destination are reduced, when logical block mirroring is used with source deduplication, by eliminating repeated transmission of data blocks from source to destination. A reference is created for each data block at the source, the reference being unique within a storage aggregate of the source. During a mirror update, the source initially sends only the references of modified data blocks to the destination. The destination compares those references against a data structure to determine whether the destination already has any of those data blocks stored. If the destination determines that it already has a data block stored, it does not request or receive that data block again from the source. Only if the destination determines that it has not yet received the referenced data block does it request and receive that data block from the source.
-
Citations
28 Claims
-
1. A method comprising:
-
storing at a source storage system a set of data that includes a plurality of data blocks, each of the data blocks being identified by a logical block pointer; deduplicating the data blocks in the set of data at the source storage system so that at least one of the data blocks is represented by at least two different logical block pointers; and replicating the data blocks in the set of data from the source storage system to a destination storage system at a logical block level corresponding to the logical block pointers, including sending the at least one data block represented by the at least two different logical block pointers from the source storage system to the destination storage system without transmitting any data block in the data set more than once from the source storage system for delivery to the destination storage system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method comprising:
-
(a) storing at a source storage system a set of data that includes a plurality of data blocks, each of the data blocks being identified by a physical block pointer and a logical block pointer; (b) deduplicating the data blocks in the set of data at the source storage system at a physical block level, so that at least one of the data blocks is represented by at least two different logical block pointers; and (c) replicating the data blocks in the set of data from the source storage system to a destination storage system at a logical block level corresponding to the plurality of logical block pointers, wherein said replicating includes (c)(1) at the source storage system, (c)(1)(A) identifying one or more of the data blocks in the data set that have been modified at the source storage system; (c)(1)(B) for each data block identified as having been modified, determining a first reference to the data block as stored in the source storage system, the first reference being unique to said data block within a storage aggregate of the source storage system; and (c)(1)(C) sending each said first reference and a corresponding logical block pointer from the source storage system to the destination storage system; (c)(2) at the destination storage system, (c)(2)(A) for each said first reference, determining whether the destination storage system already has stored the corresponding data block, by using the first reference to perform a lookup in a data structure at the destination storage system to determine whether the data structure has an entry that includes said first reference, the data structure containing a mapping of source storage system block references to destination storage system block references; (c)(2)(B) in response to determining that the data structure has an entry that includes the first reference, (c)(2)(B)(i) identifying in the entry a second reference to a data block stored in the destination storage system, and (c)(2)(B)(ii) associating the second reference with the logical block pointer of said corresponding data block; and (c)(2)(C) in response to determining that the data structure does not have an entry that includes the first reference, (c)(2)(C)(i) sending a request, from the destination storage system to the source storage system, for the source storage system to send the corresponding data block to the destination storage system, (c)(2)(C)(ii) receiving the corresponding data block at the destination storage system from the source storage system, (c)(2)(C)(iii) storing the corresponding data block at the destination storage system, and (c)(2)(C)(iv) updating the data structure at the destination storage system to have an entry that includes a mapping between the first reference and a second reference to the data block as a stored in the destination storage system. - View Dependent Claims (14, 15, 16)
-
-
17. A data mirroring system comprising:
-
a source storage system including a source non-volatile storage facility to store a set of data that includes a plurality of data blocks, each of the data blocks being identified in the source storage system by a physical block pointer and a logical block pointer, a source deduplication engine to deduplicate the data blocks in the set of data at the source storage system so that at least one of the data blocks is represented by at least two different logical block pointers, and a source replication engine to identify a data block that is to be replicated at a logical level and to determine a reference to the data block in the source storage system, the reference being unique to the data block within a storage aggregate of the source storage system; and a destination storage system including a destination non-volatile storage facility to store a mirror of the set of data, and a destination replication engine to receive the reference from the source storage system, and to use the reference to determine whether the destination storage system already has the data block stored, wherein the destination replication engine requests the data block from the source storage system if the destination storage system does not already have the data block stored but does not request the data block if the destination storage system already has the data block stored. - View Dependent Claims (18, 19, 20, 21, 22)
-
-
23. A destination storage system comprising:
-
a processor; a network interface through which the destination storage system can communicate with a source storage system; a storage facility to mirror a set of data on the source storage system; and a memory storing instructions which, when executed by the processor, cause the destination processing system to perform a process of replicating data at a logical level from the source storage system to the destination storage system, the process including receiving from the source storage system a reference to a data block to be replicated from the source storage system to the destination storage system, the reference being unique to the data block within a storage facility of the source storage system; and using the reference to determine whether the processing system already has the data block stored by performing a lookup in a data structure at the destination storage system, the data structure containing a mapping of source storage system block references to destination storage system block references, wherein the data block is not sent from the source storage system to the destination storage system if the destination storage system already has the data block stored. - View Dependent Claims (24, 25, 26, 27, 28)
-
Specification