Verifying data consistency
First Claim
1. A method for verifying data consistency between update-in-place data structures and append-only data structures containing change histories associated with the update-in-place data structures, the method comprising:
- loading data from a first update-in-place data structure to a first set of hash buckets in a processing platform, wherein the data from the first update-in-place data structure comprises a first set of key values that corresponds to rows of data in the first update-in-place data structure, and wherein loading the data from the first update-in-place data structure to the first set of hash buckets is based on a first set of hash values associated with the first set of key values;
loading data from the append-only data structures to a second set of hash buckets in the processing platform;
performing a bucket-level comparison between the data in the first set of hash buckets and the data in the second set of hash buckets;
generating an intermediate report based on the bucket-level comparison, wherein generating the intermediate report based on the bucket-level comparison comprises;
determining an update occurred to the first update-in-place data structure during the bucket-level comparison;
identifying transient differences between the first update-in-place data structure and the append-only data structures, wherein the transient differences comprise differences caused by either in-flight transactions, by rollback transactions, or by in-flight transactions and by rollback transactions committed at the first update-in-place data structure after loading the data from the first update-in-place data structure to the first set of hash buckets in the processing platform; and
removing the transient differences from the intermediate report listing differences between the first update-in-place data structure and the append-only data structures; and
generating a final report based on the intermediate report and removal of the identified transient differences, wherein the final report comprises persistent differences between the first update-in-place data structure and the append-only data structures and omits the identified transient differences removed from the intermediate report, wherein the final report is generated for live comparison of the first update-in-place data structure and the append-only data structures, and wherein the differences are inserted into a second update-in-place data structure that is associated with the first update-in-place data structure.
1 Assignment
0 Petitions
Accused Products
Abstract
A method for verifying data consistency between update-in-place data structures and append-only data structures containing change histories associated with the update-in-place data structures is provided. The method includes loading data from an update-in-place data structure to a first set of hash buckets in a processing platform, loading data from append-only data structures to a second set of hash buckets in the processing platform, performing a bucket-level comparison between the data in the first set of hash buckets and the data in the second set of has buckets, and generating a report based on the bucket-level comparison.
-
Citations
14 Claims
-
1. A method for verifying data consistency between update-in-place data structures and append-only data structures containing change histories associated with the update-in-place data structures, the method comprising:
-
loading data from a first update-in-place data structure to a first set of hash buckets in a processing platform, wherein the data from the first update-in-place data structure comprises a first set of key values that corresponds to rows of data in the first update-in-place data structure, and wherein loading the data from the first update-in-place data structure to the first set of hash buckets is based on a first set of hash values associated with the first set of key values; loading data from the append-only data structures to a second set of hash buckets in the processing platform; performing a bucket-level comparison between the data in the first set of hash buckets and the data in the second set of hash buckets; generating an intermediate report based on the bucket-level comparison, wherein generating the intermediate report based on the bucket-level comparison comprises; determining an update occurred to the first update-in-place data structure during the bucket-level comparison; identifying transient differences between the first update-in-place data structure and the append-only data structures, wherein the transient differences comprise differences caused by either in-flight transactions, by rollback transactions, or by in-flight transactions and by rollback transactions committed at the first update-in-place data structure after loading the data from the first update-in-place data structure to the first set of hash buckets in the processing platform; and removing the transient differences from the intermediate report listing differences between the first update-in-place data structure and the append-only data structures; and generating a final report based on the intermediate report and removal of the identified transient differences, wherein the final report comprises persistent differences between the first update-in-place data structure and the append-only data structures and omits the identified transient differences removed from the intermediate report, wherein the final report is generated for live comparison of the first update-in-place data structure and the append-only data structures, and wherein the differences are inserted into a second update-in-place data structure that is associated with the first update-in-place data structure. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer program product for verifying data consistency between update-in-place data structures and append-only data structures containing change histories associated with the update-in-place data structures, the computer program product comprising at least one computer readable non-transitory storage medium having computer readable program instructions thereon for execution by a processor, the computer readable program instructions comprising program instructions for:
-
loading data from a first update-in-place data structure to a first set of hash buckets in a processing platform, wherein the data from the first update-in-place data structure comprises a first set of key values that corresponds to rows of data in the first update-in-place data structure, and wherein loading the data from the first update-in-place data structure to the first set of hash buckets is based on a first set of hash values associated with the first set of key values; loading data from the append-only data structures to a second set of hash buckets in the processing platform; performing a bucket-level comparison between the data in the first set of hash buckets and the data in the second set of hash buckets; generating an intermediate report based on the bucket-level comparison, wherein generating the intermediate report based on the bucket-level comparison comprises; determining an update occurred to the first update-in-place data structure during the bucket-level comparison; identifying transient differences between the first update-in-place data structure and the append-only data structures, wherein the transient differences comprise differences caused by either in-flight transactions, by rollback transactions, or by in-flight transactions and by rollback transactions committed at the first update-in-place data structure after loading the data from the first update-in-place data structure to the first set of hash buckets in the processing platform; and removing the transient differences from the intermediate report listing differences between the first update-in-place data structure and the append-only data structures; and generating a final report based on the intermediate report and removal of the identified transient differences, wherein the final report comprises persistent differences between the first update-in-place data structure and the append-only data structures and omits the identified transient differences removed from the intermediate report, wherein the final report is generated for live comparison of the first update-in-place data structure and the append-only data structures, and wherein the differences are inserted into a second update-in-place data structure that is associated with the first update-in-place data structure. - View Dependent Claims (12)
-
-
13. A computer system for verifying data consistency between update-in-place data structures and append-only data structures containing change histories associated with the update-in-place data structures, the computer system comprising:
-
at least one processing unit; at least one computer readable memory; at least one computer readable tangible, non-transitory storage medium; and
program instructions stored on the at least one computer readable tangible, non-transitory storage medium for execution by the at least one processing unit via the at least one computer readable memory, wherein the program instructions comprise program instructions for;loading data from a first update-in-place data structure to a first set of hash buckets in a processing platform, wherein the data from the first update-in-place data structure comprises a first set of key values that corresponds to rows of data in the first update-in-place data structure, and wherein loading the data from the first update-in-place data structure to the first set of hash buckets is based on a first set of hash values associated with the first set of key values; loading data from the append-only data structures to a second set of hash buckets in the processing platform; performing a bucket-level comparison between the data in the first set of hash buckets and the data in the second set of hash buckets; generating an intermediate report based on the bucket-level comparison, wherein generating the intermediate report based on the bucket-level comparison comprises; determining an update occurred to the first update-in-place data structure during the bucket-level comparison; identifying transient differences between the first update-in-place data structure and the append-only data structures, wherein the transient differences comprise differences caused by either in-flight transactions, by rollback transactions, or by in-flight transactions and by rollback transactions committed at the first update-in-place data structure after loading the data from the first update-in-place data structure to the first set of hash buckets in the processing platform; and removing the transient differences from the intermediate report listing differences between the first update-in-place data structure and the append-only data structures; and generating a final report based on the intermediate report and removal of the identified transient differences, wherein the final report comprises persistent differences between the first update-in-place data structure and the append-only data structures and omits the identified transient differences removed from the intermediate report, wherein the final report is generated for live comparison of the first update-in-place data structure and the append-only data structures, and wherein the differences are inserted into a second update-in-place data structure that is associated with the first update-in-place data structure. - View Dependent Claims (14)
-
Specification