Database systems and methods for linking records and entity representations with sufficiently high confidence
First Claim
Patent Images
1. A computer implemented method of linking a first record in a database to a second record in a database upon a determination that the first record and the second record correspond to a same individual, the method comprising:
- calculating a plurality of match probabilities using an iterative process, each of the plurality of match probabilities corresponding to a different field common to the first record and the second record;
selecting a matching formula from the group consisting of a field weight matching formula, a field value weight matching formula, and a supplemental weight matching formula;
calculating a match score based on a plurality of terms using the selected matching formula, each of the plurality of terms corresponding to a different field common to the first record and the second record, each of the plurality of terms comprising;
(1) a probability that a field value in a corresponding field of the first record matches a field value in a corresponding field in the second record, and (2) a weight comprising a match probability, wherein the weight comprises a logarithm of the match probability, wherein the match probability comprises a probability that an arbitrary entity representation in the database comprises a particular field value, wherein the probability comprises a ratio of entity representations in the database that include the particular field value to a total number of entity representations in the database;
determining, based on the match score and a size of a population associated with the database, whether there is a sufficiently high confidence level that the first record and the second record correspond to the same individual; and
linking, in the database, the first record with the second record based on the determining.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are a system for, and method of, determining whether records correspond to the same individual. The system and method provide such a determination with a known minimum level of confidence. That is, the system and method provide an indication that records correspond to the same individual along with an associated confidence level. The system and method may be used to link records in a database that correspond to the same individuals, creating entity representations in the database.
-
Citations
24 Claims
-
1. A computer implemented method of linking a first record in a database to a second record in a database upon a determination that the first record and the second record correspond to a same individual, the method comprising:
-
calculating a plurality of match probabilities using an iterative process, each of the plurality of match probabilities corresponding to a different field common to the first record and the second record; selecting a matching formula from the group consisting of a field weight matching formula, a field value weight matching formula, and a supplemental weight matching formula; calculating a match score based on a plurality of terms using the selected matching formula, each of the plurality of terms corresponding to a different field common to the first record and the second record, each of the plurality of terms comprising;
(1) a probability that a field value in a corresponding field of the first record matches a field value in a corresponding field in the second record, and (2) a weight comprising a match probability, wherein the weight comprises a logarithm of the match probability, wherein the match probability comprises a probability that an arbitrary entity representation in the database comprises a particular field value, wherein the probability comprises a ratio of entity representations in the database that include the particular field value to a total number of entity representations in the database;determining, based on the match score and a size of a population associated with the database, whether there is a sufficiently high confidence level that the first record and the second record correspond to the same individual; and linking, in the database, the first record with the second record based on the determining. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for linking a first record in a database to a second record in a database upon a determination that the first record and the second record correspond to a same individual, the system comprising:
-
a first computing apparatus configured to calculate a plurality of match probabilities using an iterative process, each of the plurality of match probabilities corresponding to a different field common to the first record and the second record; a second computing apparatus configured to select a matching formula from the group consisting of a field weight matching formula, a field value weight matching formula, and a supplemental weight matching formula; a third computing apparatus configured to calculate a match score based on a plurality of terms using the selected matching formula, each of the plurality of terms corresponding to a different field common to the first record and the second record, each of the plurality of terms comprising;
(1) a probability that a field value in a corresponding field of the first record matches a field value in a corresponding field in the second record, and (2) a weight comprising a match probability, wherein the weight comprises a logarithm of the match probability, wherein the match probability comprises a probability that an arbitrary entity representation in the database comprises a particular field value, wherein the probability comprises a ratio of entity representations in the database that include the particular field value to a total number of entity representations in the database;a fourth computing apparatus configured to determine, based on the match score and a size of a population associated with the database, whether there is a sufficiently high confidence level that the first record and the second record correspond to the same individual; and a fifth computing apparatus configured to link, in the database, the first record with the second record based on the determining; wherein each computing apparatus comprises at least one processor. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24)
-
Specification