Hierarchical drift detection of data sets
First Claim
1. A system that facilitates data discrepancy determination, comprising:
- a partitioning component that utilizes a hierarchical structure of a data set to partition data at various levels of the data structure;
a digest component that condenses at least one data partition provided by the partitioning component;
a signature component that determines at least one signature of at least one data partition digested by the digest component; and
a comparison component that compares a data digest signature with at least one other data digest signature to ascertain if mismatched data exists;
the other data digest signature representative of data that a user desires to be equivalent to data associated with the data digest signature.
2 Assignments
0 Petitions
Accused Products
Abstract
The present leverages data hierarchies to provide a systematic means to determine data differences between equivalent data. This allows disparate data storage systems to efficiently determine divergent data locations by utilizing, for example, data signatures representative of varying degrees of data granularity. Comparative analysis can then be performed between the databases by employing an iterative approach until the desired level of data granularity is obtained. This allows, in one instance of the present invention, discrepant data to be determined without the transfer of large amounts of data and without requiring homogeneous data storage systems. Another instance of the present invention utilizes equivalent logical data views from non-identical data sets to determine data discrepancies. Yet another instance of the present invention determines discrepancies of a federated and/or integrated data system by employing reversible data statistical signatures, providing a simplistic transfer protocol and sheltering each data system from the other'"'"'s complexities.
-
Citations
30 Claims
-
1. A system that facilitates data discrepancy determination, comprising:
-
a partitioning component that utilizes a hierarchical structure of a data set to partition data at various levels of the data structure;
a digest component that condenses at least one data partition provided by the partitioning component;
a signature component that determines at least one signature of at least one data partition digested by the digest component; and
a comparison component that compares a data digest signature with at least one other data digest signature to ascertain if mismatched data exists;
the other data digest signature representative of data that a user desires to be equivalent to data associated with the data digest signature. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 28, 30)
-
-
10. A method for facilitating data discrepancy determination, comprising:
-
partitioning data into chunks and assigning signatures to the respective chunks;
determining discrepancy in a subset of the chunks via a signature comparison;
further partitioning the chunk subset and assigning new signatures to the partitioned chunk subsets; and
repeating the discrepancy determination, partitioning, and assignment of new signatures until convergence upon specific non-matching records and/or data is achieved. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 29)
-
-
26. A system that facilitates data discrepancy determination, comprising:
-
means for partitioning a data set at various levels of a hierarchical data structure;
means for digesting at least one partition of a data set;
means for determining at least one data signature of at least one digested data partition; and
means for comparing a data digest signature with at least one other data digest signature to ascertain if mismatched data exists, the other data digest signature representative of data that a user desires to be equivalent to data associated with the data digest signature.
-
-
27. A data packet, transmitted between two or more computer components, that facilitates data discrepancy determination, the data packet comprising, at least in part, information relating to a data discrepancy determination system that utilizes, at least in part, at least one data signature representative of at least one data partition based, at least in part, on a hierarchical structure of a data set and utilized in an iterative process to isolate mismatched data.
Specification