Computer-based method and system for linking records in data files
First Claim
Patent Images
1. A computerized method for linking records in data files, based on at least one identifier in common, with a threshold probability that records are linked, the method comprising the steps of:
- a. identifying records in the data files having a first identifier in common, b. using a Bayesian probabilistic approach to determine likelihood that identified records, having a first identifier in common, are linked, c. linking identified records, having a first identifier in common, whose likelihood exceeds the threshold for linking identified records having a first identifier in common.
1 Assignment
0 Petitions
Accused Products
Abstract
The present invention relates to computer-based technology for linking or matching records in data files, based on at least one identifier in common, with a threshold probability that records are linked, the method uses a Bayesian probabilistic approach to determine the likelihood that the identified records are linked.
97 Citations
7 Claims
-
1. A computerized method for linking records in data files, based on at least one identifier in common, with a threshold probability that records are linked, the method comprising the steps of:
-
a. identifying records in the data files having a first identifier in common, b. using a Bayesian probabilistic approach to determine likelihood that identified records, having a first identifier in common, are linked, c. linking identified records, having a first identifier in common, whose likelihood exceeds the threshold for linking identified records having a first identifier in common. - View Dependent Claims (2, 3, 4, 5)
a. identifying records in the data files, not already linked, having a second identifier in common, b. using a Bayesian probabilistic approach to determine likelihood that identified records, not already linked and having a second identifier in common, are linked, c. linking identified records, not already linked and having a second identifier in common, whose likelihood exceeds the threshold for Linking identified records, not already linked and having a second identifier in common.
-
-
3. The method of claim 2, further comprising the steps of:
-
a. identifying records in the data files not already linked having a third identifier in common, b. using a Bayesian probabilistic approach to determine likelihood that identified records, not already linked and having a third identifier in common, are linked, c. linking identified records, not already linked and having a third identifier in common, whose likelihood exceeds the threshold for linking identified records, not already linked and having a third identifier in common.
-
-
4. The method of claim 3, further comprising the steps of continuing the steps of claim 3 for additional identifiers until either all records are linked or all identifiers have been used.
-
5. The method of claim 1, wherein the identifier is one of the following items:
- social security number;
name (last name, first name and middle initial);
day and month of birth;
or gender.
- social security number;
-
6. A computerized system for linking records in data files, based on at least one identifier in common, with a threshold probability that records are correctly linked, comprising:
-
a. means for identifying records in the data files having a first identifier in common, b. means for using a Bayesian probabilistic approach to determine likelihood that identified records, having a first identifier in common, are linked, c. means for linking identified records, having a first identifier in common, whose likelihood exceeds the threshold for linking identified records having a first identifier in common. - View Dependent Claims (7)
-
Specification