Using logical block addresses with generation numbers as data fingerprints for network deduplication
First Claim
Patent Images
1. A method comprising:
- determining that a first data block stored by a source device has been modified, that the first data block has the same content as a second data block on the source device, and that the first data block should be updated by a destination device;
determining a block address of the second data block and a generation number of the block address, the generation number indicative of a number of times which data at the block address has been modified; and
using the block address and the generation number to determine whether the destination device already has a data block that matches the block address and the generation number and to avoid sending the first data block to the destination device after determining that the destination device already has a data block that matches the block address and the generation number.
1 Assignment
0 Petitions
Accused Products
Abstract
The technique introduced here involves using a block address and a corresponding generation number as a “fingerprint” to uniquely identify a sequence of data within a given storage domain. Each block address has an associated generation number which indicates the number of times that data at that block address has been modified. This technique can be employed, for example, to determine whether a given storage server already has the data, and to avoid sending the data to that storage server over a network if it already has the data. It can also be employed to maintain cache coherency among multiple storage nodes.
-
Citations
20 Claims
-
1. A method comprising:
-
determining that a first data block stored by a source device has been modified, that the first data block has the same content as a second data block on the source device, and that the first data block should be updated by a destination device; determining a block address of the second data block and a generation number of the block address, the generation number indicative of a number of times which data at the block address has been modified; and using the block address and the generation number to determine whether the destination device already has a data block that matches the block address and the generation number and to avoid sending the first data block to the destination device after determining that the destination device already has a data block that matches the block address and the generation number. - View Dependent Claims (2, 3, 4, 5, 6, 7, 18)
-
-
8. A method comprising:
-
determining in a source device that a first data block stored by a source device has been modified, that the first data block has the same content as a second data block on the source device, and that the first data block should be updated by a destination device, wherein the source device and destination device are coupled to communicate over a network; determining in the source device a block address of the second data block and a generation number of the block address, the generation number indicative of a number of times which data at the block address has been modified; sending the block address and the generation number of the second data block from the source device to the destination device; and using the block address and the generation number to determine in the destination device whether the destination device already has a data block that matches the block address and the generation number and to avoid sending the data block to the destination device when it is determined that the destination device already has a data block that matches the block address and the generation number. - View Dependent Claims (9, 10, 11, 19)
-
-
12. A storage server comprising:
-
a network interface through which to communicate over a network; a storage interface through which to communicate with a non-volatile mass storage; a processor coupled to the network interface and the storage interface; a memory storing code which configures the processor to cause the storage server to perform a communication process that includes determining that a first data block stored by a source device has been modified, that the first data block has the same content as a second data block on the source device, and that the first data block should be updated by a destination device; determining a block address of the second data block and a generation number of the block address, the generation number indicative of a number of times which data at the block address has been modified; and using the block address and the generation number to uniquely identify a data block that matches the block address and the generation number, including using the block address and the generation number to avoid communication of the data block over the network in response to a determination that a destination device already has a data block that matches the block address and the generation number. - View Dependent Claims (13, 14, 15, 16, 20)
-
-
17. An apparatus comprising:
-
means for determining that a first data block stored by a source device has been modified, that the first data block has the same content as a second data block on the source device, and that the first data block should be updated by a destination device; means for determining a block address of the second data block and a generation number of the block address, the generation number indicative of a number of times which data at the block address has been modified; and means for using the block address and the generation number to determine whether the destination device already has a data block that matches the block address and the generation number and to avoid sending the data block to the destination device when it is determined that the destination device already has a data block that matches the block address and the generation number.
-
Specification