Database systems and methods for linking records and entity representations with sufficiently high confidence
First Claim
1. A computer implemented method of linking a first record in a database to a second record in the database upon a determination that the first record and the second record correspond to a same individual, wherein each record comprises at least one field and each of the at least one field comprises a field value or a null value, the method comprising:
- calculating a plurality of match probabilities using an iterative process, each of the plurality of match probabilities corresponding to a different field common to the first record and the second record, the plurality of match probabilities comprising one or more of a field probability and a field value probability, wherein the field probability represents a probability that two randomly chosen records share a common field value in an associated field, and wherein the field value probability represents a probability that a record chosen at random contains an associated field value;
selecting a matching formula from the group consisting of a field weight matching formula, a field value weight matching formula, and a supplemental weight matching formula, wherein each of the matching formulas are derived, at least in part, from one or more match probabilities;
calculating a match score based on a plurality of terms using the selected matching formula, each of the plurality of terms corresponding to a different field common to the first record and the second record, each of the plurality of terms comprising;
(1) a probability that a field value in a corresponding field of the first record matches a field value in a corresponding field in the second record, and (2) a weight derived from a match probability, wherein the match probability comprises a probability that an arbitrary entity representation in the database comprises a particular field value;
determining, based on the match score and a size of a population associated with the database, whether there is a sufficiently high confidence level that the first record and the second record correspond to the same individual; and
linking, in the database, the first record with the second record based on the determining.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed are a system for, and method of, determining whether records correspond to the same individual. The system and method provide such a determination with a known minimum level of confidence. That is, the system and method provide an indication that records correspond to the same individual along with an associated confidence level. The system and method may be used to link records in a database that correspond to the same individuals, creating entity representations in the database.
262 Citations
19 Claims
-
1. A computer implemented method of linking a first record in a database to a second record in the database upon a determination that the first record and the second record correspond to a same individual, wherein each record comprises at least one field and each of the at least one field comprises a field value or a null value, the method comprising:
-
calculating a plurality of match probabilities using an iterative process, each of the plurality of match probabilities corresponding to a different field common to the first record and the second record, the plurality of match probabilities comprising one or more of a field probability and a field value probability, wherein the field probability represents a probability that two randomly chosen records share a common field value in an associated field, and wherein the field value probability represents a probability that a record chosen at random contains an associated field value; selecting a matching formula from the group consisting of a field weight matching formula, a field value weight matching formula, and a supplemental weight matching formula, wherein each of the matching formulas are derived, at least in part, from one or more match probabilities; calculating a match score based on a plurality of terms using the selected matching formula, each of the plurality of terms corresponding to a different field common to the first record and the second record, each of the plurality of terms comprising;
(1) a probability that a field value in a corresponding field of the first record matches a field value in a corresponding field in the second record, and (2) a weight derived from a match probability, wherein the match probability comprises a probability that an arbitrary entity representation in the database comprises a particular field value;determining, based on the match score and a size of a population associated with the database, whether there is a sufficiently high confidence level that the first record and the second record correspond to the same individual; and linking, in the database, the first record with the second record based on the determining. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for linking a first record in a database to a second record in the database upon a determination that the first record and the second record correspond to a same individual, wherein each record comprises at least one field and each of the at least one field comprises a field value or a null value, the system comprising at least one computing apparatus, comprising at least one processor and at least one memory, wherein the at least one computing apparatus is collectively configured to:
-
calculate a plurality of match probabilities using an iterative process, each of the plurality of match probabilities corresponding to a different field common to the first record and the second record, the plurality of match probabilities comprising one or more of a field probability and a field value probability, wherein the field probability represents a probability that two randomly chosen records share a common field value in an associated field, and wherein the field value probability represents a probability that a record chosen at random contains an associated field value; select a matching formula from the group consisting of a field weight matching formula, a field value weight matching formula, and a supplemental weight matching formula, wherein each of the matching formulas are derived, at least in part, from one or more match probabilities; calculate a match score based on a plurality of terms using the selected matching formula, each of the plurality of terms corresponding to a different field common to the first record and the second record, each of the plurality of terms comprising;
(1) a probability that a field value in a corresponding field of the first record matches a field value in a corresponding field in the second record, and (2) a weight derived from a match probability, wherein the match probability comprises a probability that an arbitrary entity representation in the database comprises a particular field value;determine, based on the match score and a size of a population associated with the database, whether there is a sufficiently high confidence level that the first record and the second record correspond to the same individual; and link, in the database, the first record with the second record based on the determining. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 19)
-
Specification