System and method for automatic weight generation for probabilistic matching
First Claim
1. A computer-implemented method of automatically generating weights for associating a plurality of data records from one or more data sources at one or more physical locations, comprising:
- a) generating unmatched probabilities for a set of candidate data records, wherein the unmatched probabilities are computed per attribute for each pair of data records in the set of candidate data records;
b) comparing each pair of data records in the set of candidate data records using current weights for selected attributes;
c) determining a candidate matched set with results from the comparing step;
d) generating true discrepancy probabilities with scoring information from the candidate matched set;
e) calculating new weights for the selected attributes based upon the unmatched probabilities and the true discrepancy probabilities to adjust performance of the association of data records; and
repeating steps b)-e) using the new weights if a difference between the current weights and the new weights is larger than a predetermined amount.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the invention provide a system and method of automatically generating weights for matching data records. Each field of a record may be compared by an exact match and/or close matches and each comparison can result in a mathematical score which is the sum of the field comparisons. To sum up the field scores accurately, the automatic weight generation process comprises an iterative process. In one embodiment, initial weights are computed based upon unmatched-set probabilities and default discrepancy weights associated with attributes in the comparison algorithm. A bulk cross-match is performed across the records using the initial weights and a candidate matched set is computed for updating the discrepancy probabilities. New weights are computed based upon the unmatched probabilities and the updated discrepancy probabilities. Test for convergence between the new weights and the old weights. Repeat with the new weight table until the weights converge to their final value.
-
Citations
6 Claims
-
1. A computer-implemented method of automatically generating weights for associating a plurality of data records from one or more data sources at one or more physical locations, comprising:
-
a) generating unmatched probabilities for a set of candidate data records, wherein the unmatched probabilities are computed per attribute for each pair of data records in the set of candidate data records; b) comparing each pair of data records in the set of candidate data records using current weights for selected attributes; c) determining a candidate matched set with results from the comparing step; d) generating true discrepancy probabilities with scoring information from the candidate matched set; e) calculating new weights for the selected attributes based upon the unmatched probabilities and the true discrepancy probabilities to adjust performance of the association of data records; and repeating steps b)-e) using the new weights if a difference between the current weights and the new weights is larger than a predetermined amount. - View Dependent Claims (2, 3, 4, 5, 6)
-
Specification