×

Database systems and methods for linking records and entity representations with sufficiently high confidence

  • US 8,495,077 B2
  • Filed: 07/11/2012
  • Issued: 07/23/2013
  • Est. Priority Date: 04/24/2008
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method of linking a first record in a database to a second record in the database upon a determination that the first record and the second record correspond to a same individual, wherein each record comprises at least one field and each of the at least one field comprises a field value or a null value, the method comprising:

  • calculating a plurality of match probabilities using an iterative process, each of the plurality of match probabilities corresponding to a different field common to the first record and the second record, the plurality of match probabilities comprising one or more of a field probability and a field value probability, wherein the field probability represents a probability that two randomly chosen records share a common field value in an associated field, and wherein the field value probability represents a probability that a record chosen at random contains an associated field value;

    selecting a matching formula from the group consisting of a field weight matching formula, a field value weight matching formula, and a supplemental weight matching formula, wherein each of the matching formulas are derived, at least in part, from one or more match probabilities;

    calculating a match score based on a plurality of terms using the selected matching formula, each of the plurality of terms corresponding to a different field common to the first record and the second record, each of the plurality of terms comprising;

    (1) a probability that a field value in a corresponding field of the first record matches a field value in a corresponding field in the second record, and (2) a weight derived from a match probability, wherein the match probability comprises a probability that an arbitrary entity representation in the database comprises a particular field value;

    determining, based on the match score and a size of a population associated with the database, whether there is a sufficiently high confidence level that the first record and the second record correspond to the same individual; and

    linking, in the database, the first record with the second record based on the determining.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×