Probabilistic model for record linkage
First Claim
1. A computer-implemented method for probabilistic record linkage comprising:
- providing a record pair comprising a plurality of fields;
providing a plurality of scenarios, each scenario relating to a distribution of patterns among a plurality of attribute statuses;
comparing the record pair to determine a record difference;
determining a probability of a status for each of a plurality of attributes based on the distance metric of the plurality of fields, wherein each field corresponds to a respective attribute, wherein the field is observable and the attribute is hidden;
determining a probability of each scenario based on the probability of the status for each attribute and the Bayesian net representing the probabilistic model on the relationship between scenarios and attributes; and
outputting a probability of duplication or non-duplication of the record pair determined from the probabilities of the plurality of scenarios.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for probabilistic record linkage includes providing a record pair comprising a plurality of fields, providing a plurality of scenarios, each scenario relating to a distribution of patterns among a plurality of attribute statuses, and comparing the record pair to determine a record difference. The method includes determining a probability of a status for each of a plurality of attributes based on the distance metric of the plurality of fields, wherein each field corresponds to a respective attribute, wherein the field is observable and the attribute is hidden, determining a probability of each scenario based on the probability of the status for each attribute and the Bayesian net representing the probabilistic model on the relationship between scenarios and attributes, and outputting a probability of duplication or non-duplication of the record pair determined from the probabilities of the plurality of scenarios.
170 Citations
19 Claims
-
1. A computer-implemented method for probabilistic record linkage comprising:
-
providing a record pair comprising a plurality of fields;
providing a plurality of scenarios, each scenario relating to a distribution of patterns among a plurality of attribute statuses;
comparing the record pair to determine a record difference;
determining a probability of a status for each of a plurality of attributes based on the distance metric of the plurality of fields, wherein each field corresponds to a respective attribute, wherein the field is observable and the attribute is hidden;
determining a probability of each scenario based on the probability of the status for each attribute and the Bayesian net representing the probabilistic model on the relationship between scenarios and attributes; and
outputting a probability of duplication or non-duplication of the record pair determined from the probabilities of the plurality of scenarios. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-implemented method comprising:
-
receiving a record pair; and
outputting a probability of duplication between the record pair from an observation of field values of the record pair and noisy characteristics of the record pair. - View Dependent Claims (10, 11, 19)
-
-
12. A program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for probabilistic record linkage, the method steps comprising:
-
providing a record pair comprising a plurality of fields;
providing a plurality of scenarios, each scenario relating to a distribution of patterns among a plurality of attribute statuses;
comparing the record pair to determine a record difference;
determining a probability of a status for each of a plurality of attributes based on the distance metric of the plurality of fields, wherein each field corresponds to a respective attribute, wherein the field is observable and the attribute is hidden;
determining a probability of each scenario based on the probability of the status for each attribute and the Bayesian net representing the probabilistic model on the relationship between scenarios and attributes; and
outputting a probability of duplication or non-duplication of the record pair determined from the probabilities of the plurality of scenarios. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
Specification