SYSTEMS AND METHODS FOR SELECTIVE DATA REPLICATION
First Claim
1. A method for performing data replication, the method comprising:
- performing an assessment on first data stored on a first storage device and second data stored on a second storage device, at least a portion of the second data previously having been replicated from the first data, the assessment comprising,comparing one or more attributes of files in the first data with those of corresponding files in the second data, andidentifying a file having at least one of the one or more attributes different in the first and second data;
comparing the size of the identified file with a selected threshold value;
if the size of the identified file is less than or equal to the selected threshold value, replicating the identified file from the first storage device to the second storage device; and
if the size of the identified file is greater than the selected threshold value;
obtaining checksums for the identified file in the first data and its corresponding file in the second data;
comparing the checksums;
if the checksums are different, replicating the identified file from the first storage device to the second storage device; and
if the checksums are the same, synchronizing the one or more different attributes of the identified file in the first data and the corresponding file in the second data, and not replicating the identified file.
4 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for performing data replication are disclosed. Determining whether to update replicated data typically involves comparison of readily obtainable attributes of a given source file and its corresponding replicated file. Such attributes can be obtained from, for example, metadata. In certain situations, an additional assessment of the source and replicated files can be beneficial. For example, if integrity of an existing replicated file'"'"'s content is maintained, one may not want to re-replicate the corresponding source file. For large source files, such a decision can provide substantial reductions in expenditures of available computing and network resources. In certain embodiments, a threshold for identifying such large files can be based on one or more operating parameters such as network type and available bandwidth. In certain embodiments, replication file'"'"'s integrity can be checked by calculating and comparing checksums for the replication file and its corresponding source file.
-
Citations
17 Claims
-
1. A method for performing data replication, the method comprising:
-
performing an assessment on first data stored on a first storage device and second data stored on a second storage device, at least a portion of the second data previously having been replicated from the first data, the assessment comprising, comparing one or more attributes of files in the first data with those of corresponding files in the second data, and identifying a file having at least one of the one or more attributes different in the first and second data; comparing the size of the identified file with a selected threshold value; if the size of the identified file is less than or equal to the selected threshold value, replicating the identified file from the first storage device to the second storage device; and if the size of the identified file is greater than the selected threshold value; obtaining checksums for the identified file in the first data and its corresponding file in the second data; comparing the checksums; if the checksums are different, replicating the identified file from the first storage device to the second storage device; and if the checksums are the same, synchronizing the one or more different attributes of the identified file in the first data and the corresponding file in the second data, and not replicating the identified file. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A data replication system, comprising:
-
a data storage system configured to store replication of at least a portion of data from a client system, the client system capable of communicating with the data storage system to facilitate transfer of data therebetween; and a replication agent in communication with the client system and the data storage system and configured to obtain information about an identified file on the client system, the identified file having at least one metadata attribute that is different from that of an existing replicated copy of the identified file on the data storage system, the replication agent further configured to; obtain a size of the identified file, compare the size of the identified file with a threshold value, if the size is less than or equal to the threshold value, replicate the identified file so as to replace or update the existing replicated copy of the identified file, and if the size is greater than the threshold value, (1) obtaining and comparing checksums of the identified file and the replicated file, and (2) replicating the identified file so as to replace or update the existing replicated copy of the identified file if the checksums are different. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. A replication system comprising:
-
means for identifying a first file in a first system based on a comparison of one or more attributes of the first file and a second file on a second system, the second file representing an existing replicated copy of the first file; means for comparing the size of the identified file with a threshold value; and means for determining whether to replicate one or more blocks of the first file to the second file again based at least in part on the size comparison. - View Dependent Claims (16, 17)
-
Specification