×

Statistical record linkage calibration for reflexive, symmetric and transitive distance measures at the field and field value levels without the need for human interaction

  • US 8,135,680 B2
  • Filed: 04/24/2009
  • Issued: 03/13/2012
  • Est. Priority Date: 04/24/2008
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented iterative process for generating entity representations in a computer implemented database using a record matching formula and for generating parameters for the record matching formula, each entity representation comprising at least one record, the database comprising a plurality of records, each record comprising a plurality of fields, each field capable of containing a field value, wherein at least a portion of parameters for the record matching formula are configured for a particular field value associated with a selected field, and wherein the process provides for linking records or entity representations with non-identical field values, the process comprising:

  • applying a symmetric, reflexive and transitive function to each field value in the selected field of each of a plurality of records in the database, whereby applying the symmetric, reflexive and transitive function to each field value in the selected field of each of a plurality of records in the database defines a partition of the plurality of records, wherein the partition of the plurality of records comprises a plurality of parts, each of the parts associated with at least one field value appearing in the selected field;

    calculating a first logarithm of a first probability that an arbitrary record in the database is in a part associated with the particular field value, wherein the first probability comprises a ratio of records in the part associated with the particular field value to a total number of records in the database;

    forming a plurality of entity representations in the database, each entity representation comprising at least two records linked using a first instance of the record matching formula, at least one entity representation comprising at least two records linked using a first instance of the record matching formula that comprises the first logarithm of the first probability;

    calculating a second logarithm of a second probability that an arbitrary entity representation in the database comprises a record that is in the part associated with the particular field value, wherein the second probability comprises a ratio of entity representations in the part associated with the particular field value to a total number of entity representations in the database;

    linking at least two entity representations in the database based on a second instance of the record matching formula, wherein the second instance of the record matching formula comprises the second logarithm of the second probability, whereby a number of entity representations in the database is reduced by the linking entity representations relative to a number of entity representations in the database prior to the linking at least two entity representations; and

    retrieving information from at least one record in the database.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×