METHOD TO CHECK FILE DATA INTEGRITY AND REPORT INCONSISTENCIES WITH BULK DATA MOVEMENT
First Claim
1. A computer-implemented method for detecting and reporting data inconsistency in a storage system, the method comprising:
- resuming, by a data seeding module, a suspended process for migrating data from a source tier to a target tier, wherein the data corresponds to a plurality of data segments in a plurality of containers;
loading, by the data seeding module, a vector in memory, wherein the vector represents data segments in the plurality of containers;
resetting, in the vector, a bit corresponding to a fingerprint for each data segment that has been copied to the target tier;
determining that a checksum of the vector is non-zero after the data seeding module completes copying data segments in plurality of containers; and
repeating the loading, the copying and the determining operations in response to determining that the checksum is non-zero.
3 Assignments
0 Petitions
Accused Products
Abstract
In an embodiment, a method for validating data integrity of a seeding process is described. The seeding process for migrating data from a source tier to a target tier persists a perfect hash vector (PHV) to a disk when the seeding process is suspended for various reasons. The PHV includes bits for fingerprints for data segments corresponding to the data, and can be reloaded into memory upon resumption of the seeding process. One or more bits corresponding to fingerprints for copied data segments are reset prior to starting the copy phase in the resumed run. A checksum of the PHV is calculated after the seeding process completes copying data segments in the containers. A non-zero checksum of the PHV indicates that one or more data segments are missing on the source tier or the data segments are not successfully copied to the target tier. The missing data segments and/or one or more related files are reported to a user via a user interface.
1 Citation
20 Claims
-
1. A computer-implemented method for detecting and reporting data inconsistency in a storage system, the method comprising:
-
resuming, by a data seeding module, a suspended process for migrating data from a source tier to a target tier, wherein the data corresponds to a plurality of data segments in a plurality of containers; loading, by the data seeding module, a vector in memory, wherein the vector represents data segments in the plurality of containers; resetting, in the vector, a bit corresponding to a fingerprint for each data segment that has been copied to the target tier; determining that a checksum of the vector is non-zero after the data seeding module completes copying data segments in plurality of containers; and repeating the loading, the copying and the determining operations in response to determining that the checksum is non-zero. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations comprising:
-
resuming, by a data seeding module, a suspended process for migrating data from a source tier to a target tier, wherein the data corresponds to a plurality of data segments in a plurality of containers; loading, by the data seeding module, a vector in memory, wherein the vector represents data segments in the plurality of containers; resetting, in the vector, a bit corresponding to a fingerprint for each data segment that has been copied to the target tier; determining that a checksum of the vector is non-zero after the data seeding module completes copying data segments in plurality of containers; and repeating the loading, the copying and the determining operations in response to determining that the checksum is non-zero. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A data processing system, comprising:
-
a processor; a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to perform operations, the operations comprising; resuming, by a data seeding module, a suspended process for migrating data from a source tier to a target tier, wherein the data corresponds to a plurality of data segments in a plurality of containers; loading, by the data seeding module, a vector in memory, wherein the vector represents data segments in the plurality of containers; resetting, in the vector, a bit corresponding to a fingerprint for each data segment that has been copied to the target tier; determining that a checksum of the vector is non-zero after the data seeding module completes copying data segments in plurality of containers; and repeating the loading, the copying and the determining operations in response to determining that the checksum is non-zero. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification