Comparing mass storage devices through digests that are representative of stored data in order to minimize data transfer
First Claim
1. In a computing environment including a primary system having a primary mass storage device that stores a plurality of primary data blocks and a backup system having a backup mass storage device that stores a plurality of backup data blocks, a method for comparing the primary mass storage device to the backup mass storage device, comprising the steps of:
- calculating, by the backup system, a first digest based on a selected backup data block, the selected backup data block corresponding to a physical location within the backup mass storage device, wherein the first digest is smaller than the selected backup data block;
calculating, by the primary system, a second digest based on a selected primary data block, the selected primary data block corresponding to the selected backup data block and also corresponding to a physical location within the primary mass storage device, wherein the second digest is smaller than the selected primary data block, andcomparing the second digest with the first digest to determine whether the second digest and the first digest indicate that the selected backup data block and the selected primary data block contain the same data.
3 Assignments
0 Petitions
Accused Products
Abstract
A system and method for comparing mass storage devices. Generally, a mass storage device is subdivided into data blocks representing physical storage locations of some particular size. For each one (or group) of data block(s), a digest is calculated, wherein the digest is an alternate representation of the data stored within the data block(s). A digest is highly dependent on the data it represents such that different data is extremely likely to result in different digests. This high dependence allows for comparing digests rather than directly comparing the data stored at each data block. Furthermore, by calculating a digest that requires fewer bits than the data block(s) it represents, the amount of data that must be transferred between mass storage devices during comparison is minimized.
208 Citations
28 Claims
-
1. In a computing environment including a primary system having a primary mass storage device that stores a plurality of primary data blocks and a backup system having a backup mass storage device that stores a plurality of backup data blocks, a method for comparing the primary mass storage device to the backup mass storage device, comprising the steps of:
-
calculating, by the backup system, a first digest based on a selected backup data block, the selected backup data block corresponding to a physical location within the backup mass storage device, wherein the first digest is smaller than the selected backup data block; calculating, by the primary system, a second digest based on a selected primary data block, the selected primary data block corresponding to the selected backup data block and also corresponding to a physical location within the primary mass storage device, wherein the second digest is smaller than the selected primary data block, and comparing the second digest with the first digest to determine whether the second digest and the first digest indicate that the selected backup data block and the selected primary data block contain the same data. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 12)
-
-
11. In a computing environment including a primary system having a primary mass storage device that stores a plurality of primary data blocks and a backup system having a backup mass storage device that stores a plurality of backup data blocks, a method for comparing the primary mass storage device to the backup mass storage device, comprising repeating, until all primary data blocks have been compared to corresponding backup data blocks, the steps of:
-
calculating, by the backup system, a first digest based on a selected backup data block, the selected backup data block corresponding to a physical location within the backup mass storage device, wherein the first digest is smaller than the selected backup data block; transmitting the first digest from the backup system to the primary system; calculating, by the primary system, a second digest based on a selected primary data block, the selected primary data block corresponding to the selected backup data block and also corresponding to a physical location within the primary mass storage device, wherein the second digest is smaller than the selected primary data block; and comparing the second digest with the first digest to determine whether the second digest and the first digest indicate that the selected backup data block and the selected primary data block contain the same data. - View Dependent Claims (13, 14, 15, 16, 17)
-
-
18. In a computing environment including a mass storage device having a plurality of data blocks, a method for comparing data blocks, comprising the steps of:
-
selecting a first data block; calculating a first digest, the first digest being generated from data contained within the first data block and being smaller than the first data block, the first data block corresponding to a physical location within the mass storage device; selecting a second data block; calculating a second digest, the second digest being generated from data contained within the second data block and being smaller than the second data block, the second data block corresponding to a physical location within the mass storage device, the first and second digests being calculated in a process wherein a change in the data blocks would have a high probability of changing the value of the corresponding digest; and comparing the first digest with the second digest to determine whether the first digest and the second digest indicate that the first data block and the second data block contain the same data.
-
-
19. A system for comparing mass storage, comprising:
-
primary mass storage means attached to a primary system for storing a plurality of primary data blocks; backup mass storage means attached to a backup system for storing a plurality of backup data blocks; backup system processor means for performing the steps of; retrieving a backup data block from the backup mass storage means, the backup data block corresponding to a physical location within the backup mass storage means; calculating a first digest based on data stored in the backup data block, the first digest being smaller than and having a high probability of uniquely correlating to the backup data block it represents; and repeating the above steps until all backup data blocks have been considered; and primary system processor means for performing the steps of; receiving the first digest from the backup system processor means; retrieving a primary data block from the primary mass storage means, the primary data block corresponding to a physical location within the primary mass storage means and also corresponding to the backup data block associated with the first digest; calculating a second digest based on data stored in the primary data block, the second digest being smaller than and having a high probability of uniquely correlating to the primary data block it represents; comparing the first digest and the second digest; interpreting any difference between the first and second digests to indicate a difference between the primary mass storage means and the backup mass storage means; and repeating the above steps until all primary and backup data blocks have been considered. - View Dependent Claims (20, 21, 22)
-
-
23. A computer program product for implementing a method for use in a backup system including a backup mass storage device that stores a plurality of backup data blocks, the backup system being connected to a primary system including a primary mass storage device that stores a plurality of primary data blocks, the computer program product comprising:
-
a computer-readable medium carrying computer-executable instructions for implementing the method, wherein the computer-executable instructions comprise; means for retrieving a backup data block from the backup mass storage device, the backup data block corresponding to a physical location within the backup mass storage device; means for calculating a first digest based on data stored in the backup data block, the first digest being smaller than and having a high probability of uniquely correlating to the backup data block it represents; and means for transmitting the first digest to the primary system; wherein retrieving the backup data block, calculating the first digest, and transmitting the first digest are conducted until all of the plurality of backup data blocks have been considered. - View Dependent Claims (24, 25)
-
-
26. A computer program product for implementing a method for use in a primary system including a primary mass storage device that stores a plurality of primary data blocks, the primary system connected to a backup system including a backup mass storage device that stores a plurality of backup data blocks, the computer program product comprising:
-
a computer-readable medium carrying computer-executable instructions for implementing the method, wherein the computer-executable instructions comprise; means for receiving a first digest from the backup system; means for retrieving a primary data block from the primary mass storage device that corresponds to the first digest, the primary data block corresponding to a physical location within the primary mass storage device; means for calculating a second digest based on data stored in the primary data block, the second digest being smaller than and having a high probability of uniquely correlating to the primary data block it represents; means for comparing the first digest and the second digest; and means for interpreting any difference between the first and second digests to indicate a difference between the backup mass storage device and the primary mass storage device; wherein receiving the first digest, retrieving the primary data block, calculating the second digest, comparing the first digest and the second digest, and interpreting any difference are conducted until all backup data blocks and all primary data blocks have been considered. - View Dependent Claims (27, 28)
-
Specification