Data comparison system
First Claim
1. A computer-implemented method for performing a tolerance based comparison between a legacy data store and a new data store, the method comprising:
- receiving, by a processor from a device of a user, a compare data structure comprising a plurality of data item pairs, each data item pair identifying a legacy data item of a legacy dataset and a new data item of a new dataset,wherein each data item pair comprises a data type of a table data type, a flat structure data type, a deep structure data type, or a field data type, andwherein the table data type, the flat structure data type and the deep structure data type each comprise a plurality of records;
receiving, by the processor from the device of the user, a plurality of tolerances, each tolerance being associated with one of the data item pairs and indicative of an acceptable difference between the data item pair according to the data type of the data item pair;
recursively comparing, by the processor, each data item pair of the plurality of data item pairs wherein recursively comparing comprises determining the data type of each data item pair, and;
when the data item pair is a determined to be the table data type, calling a compare subroutine for each record in each table of the data item pair to form new data item pairs to compare;
when the data item pair is determined to be the flat structure data type or the deep structure data type, calling the compare subroutine for each record in the flat structure data or the deep structure of the data item pair to form new data item pairs to compare;
when the data item pair is determined to not be one of the table data type, the flat structure data type, the deep structure data type, or the field data type, writing a log entry indicating that the data item pair is an unknown data type;
determining, by the processor, that each of one or more of the plurality of data item pairs being compared comprises the field data type;
identifying a subset of the plurality of data item pairs comprising the one or more of the plurality of data item pairs determined to be of the field data type; and
for each of the one or more of the plurality of data item pairs determined to be of the subset of the plurality of data item pairs;
checking, by the processor, each legacy data item in relation to each new data item of each data item pair in accordance with the associated tolerance; and
assigning, by the processor, a category among a plurality of categories for each data item pair determined to be of the subset based on the difference of each data item pair within the tolerance associated with each data item pair, wherein the plurality of categories comprisesan exact match category,a within tolerance category, andan outside of tolerance category;
transforming, by the processor, a result of the checking and assigning into a report, wherein the report describesa percentage of the data item pairs assigned the exact match category,a percentage of the data item pairs assigned the within tolerance category, anda percentage of the data item pairs assigned the outside of tolerance category; and
providing, by the processor to the device of the user, the report.
2 Assignments
0 Petitions
Accused Products
Abstract
A data comparison system is described. The system may include a memory, an interface, and a processor. The memory may store a compare data structure containing multiple data item pairs, each pair including a legacy data item of a legacy dataset and a corresponding new data item of a new dataset, and a tolerance associated with each data item pair. The processor may receive the compare data structure and the associated tolerances. The processor may call a compare data subroutine to compare each data item pair in accordance with the associated tolerance if the data items are fields. Otherwise, the processor may recursively call the compare data subroutine for each record the data items until the data item are fields. The processor may then compare data items in accordance with the associated tolerance.
30 Citations
31 Claims
-
1. A computer-implemented method for performing a tolerance based comparison between a legacy data store and a new data store, the method comprising:
-
receiving, by a processor from a device of a user, a compare data structure comprising a plurality of data item pairs, each data item pair identifying a legacy data item of a legacy dataset and a new data item of a new dataset, wherein each data item pair comprises a data type of a table data type, a flat structure data type, a deep structure data type, or a field data type, and wherein the table data type, the flat structure data type and the deep structure data type each comprise a plurality of records; receiving, by the processor from the device of the user, a plurality of tolerances, each tolerance being associated with one of the data item pairs and indicative of an acceptable difference between the data item pair according to the data type of the data item pair; recursively comparing, by the processor, each data item pair of the plurality of data item pairs wherein recursively comparing comprises determining the data type of each data item pair, and; when the data item pair is a determined to be the table data type, calling a compare subroutine for each record in each table of the data item pair to form new data item pairs to compare; when the data item pair is determined to be the flat structure data type or the deep structure data type, calling the compare subroutine for each record in the flat structure data or the deep structure of the data item pair to form new data item pairs to compare; when the data item pair is determined to not be one of the table data type, the flat structure data type, the deep structure data type, or the field data type, writing a log entry indicating that the data item pair is an unknown data type; determining, by the processor, that each of one or more of the plurality of data item pairs being compared comprises the field data type; identifying a subset of the plurality of data item pairs comprising the one or more of the plurality of data item pairs determined to be of the field data type; and for each of the one or more of the plurality of data item pairs determined to be of the subset of the plurality of data item pairs; checking, by the processor, each legacy data item in relation to each new data item of each data item pair in accordance with the associated tolerance; and assigning, by the processor, a category among a plurality of categories for each data item pair determined to be of the subset based on the difference of each data item pair within the tolerance associated with each data item pair, wherein the plurality of categories comprises an exact match category, a within tolerance category, and an outside of tolerance category; transforming, by the processor, a result of the checking and assigning into a report, wherein the report describes a percentage of the data item pairs assigned the exact match category, a percentage of the data item pairs assigned the within tolerance category, and a percentage of the data item pairs assigned the outside of tolerance category; and providing, by the processor to the device of the user, the report. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer-implemented method for performing a tolerance based comparison between a legacy data store and a new data store, the method comprising:
-
receiving, by a processor from a device of a user, a compare data structure, wherein the compare data structure describes a plurality of mappings between a first plurality of data items of a first dataset and a second plurality of data items of a second dataset, and wherein the data items comprise at least one of a table, a field, a deep structure, or a flat structure; populating, by the processor, the compare data structure with a first plurality of data items from the first dataset and the second plurality of data items from the second dataset; receiving, by the processor, a plurality of tolerances, each tolerance being associated with one of the mappings between the first plurality of data items and the second plurality of data items, wherein each tolerance is associated with a threshold variance value for each mapping of the plurality of mappings and indicative of an acceptable difference between the data item pair according to the data type of the data item pair; recursively comparing, by the processor, each data item pair of the plurality of data item pairs wherein recursively comparing comprises determining the data type of each data item pair, and; when the data item pair is a determined to be the table data type, calling a compare subroutine for each record in each table of the data item pair to form new data item pairs to compare; when the data item pair is determined to be the flat structure data type or the deep structure data type, calling the compare subroutine for each record in the flat structure data or the deep structure of the data item pair to form new data item pairs to compare; when the data item pair is determined to not be one of the table data type, the flat structure data type, the deep structure data type, or the field data type, writing a log entry indicating that the data item pair is an unknown data type; determining, by the processor, that each of one or more of the first plurality of data items and of the second plurality of data items comprises the field data type; identifying a subset of the first plurality of data items comprising the one or more of the first plurality of data items determined to be of the field data type and identifying a subset of the second plurality of data items comprising the one or more of the second plurality of data items determined to be of the field data type; for each of the one or more of the first and second plurality of data items determined to be of the subset of the plurality of data item pairs; checking, by the processor, the data items of the subset of the first plurality of data items in the compare data structure with the data items of the subset of the second plurality of data items in the compare structure in accordance with the plurality of mappings and the plurality of tolerances; and assigning, by the processor, a category among a plurality of categories for each data item pair determined to be of the subset based on the difference of each data item pair within the tolerance associated with each data item pair, wherein the plurality of categories comprises an exact match category, a within tolerance category, and an outside of tolerance category; transforming, by the processor, a result of the checking and assigning into a report, wherein the report describes a percentage of the data items of the first and second plurality of data items assigned to the exact match category, a percentage of the data items of the first and second plurality of data items assigned to the within tolerance category, and a percentage of the data items of the first and second plurality of data items assigned to the outside of the tolerance category; and providing, by the processor to the device of the user, the report. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A data comparison system, the system comprising:
-
a memory to store a compare data structure comprising a plurality of data item pairs, each data item pair identifying a legacy data item of a legacy dataset and a new data item of a new dataset, wherein each data item pair comprises a data type of a field data type, a table data type, a flat structure data type, or a deep structure data type, and wherein the table data type, the flat structure data type and the deep structure data type each comprise a plurality of records, and a plurality of tolerances, each tolerance being associated with one of the data item pairs and indicative of an acceptable difference between the data item pair according to the data type of the data item pair; an interface operatively connected to the memory, the interface operative to communicate with a device of a user; and a processor operatively connected to the memory and the interface, the processor operative to receive, from the device of the user via the interface, the compare data structure and the plurality of tolerances, recursively call a compare data subroutine to compare each legacy data item and each new data item identified by each data item pair wherein recursively comparing comprises determining the data type of each data item pair, and; when the data item pair is a determined to be the table data type, calling a compare subroutine for each record in each table of the data item pair to form new data item pairs to compare; when the data item pair is determined to be the flat structure data type or the deep structure data type, calling the compare subroutine for each record in the flat structure data or the deep structure of the data item pair to form new data item pairs to compare; when the data item pair is determined to not be one of the table data type, the flat structure data type, the deep structure data type, or the field data type, writing a log entry indicating that the data item pair is an unknown data type; determine that each of one or more of the data item pairs being compared comprises the field data type, identify a subset of the data item pairs comprising the one or more of the plurality of data item pairs determined to be of the field data type, and for each of the one or more of the data item pairs determined to be of the subset of the data item pairs; check each legacy data item and each new data item in accordance with the associated tolerance if the data type of each data item pair in accordance with the associated tolerance; and assign a category among a plurality of categories for each data item pair determined to be of the subset based on the difference of each data item pair within the tolerance associated with each data item pair, wherein the plurality of categories comprises an exact match category, a within tolerance category, and an outside of tolerance category. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31)
-
Specification