×

STATISTICAL RECORD LINKAGE CALIBRATION FOR REFLEXIVE, SYMMETRIC AND TRANSITIVE DISTANCE MEASURES AT THE FIELD AND FIELD VALUE LEVELS WITHOUT THE NEED FOR HUMAN INTERACTION

  • US 20090271405A1
  • Filed: 04/24/2009
  • Published: 10/29/2009
  • Est. Priority Date: 04/24/2008
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented iterative process for generating entity representations in a computer implemented database using a record matching formula and for generating parameters for the record matching formula, the database comprising a plurality of records, each record comprising a plurality of fields, each field capable of containing a field value, wherein at least a portion of parameters for the record matching formula are configured for a particular field value associated with a selected field, and wherein the process provides for linking records or entity representations with non-identical field values, the process comprising:

  • applying a symmetric, reflexive and transitive function to each field value in the selected field of each of a plurality of records in the database, whereby applying the symmetric, reflexive and transitive function to each field value in the selected field of each of a plurality of records in the database defines a partition of the plurality of records, wherein the partition of the plurality of records comprises a plurality of parts, each of the parts associated with at least one field value appearing in the selected field;

    calculating a first probability, the first probability reflecting a likelihood that an arbitrary record in the database is in a part associated with the particular field value;

    forming a plurality of entity representations in the database, each entity representation comprising at least two records linked using a first instance of the record matching formula, at least one entity representation comprising at least two records linked using a first instance of the record matching formula that comprises a first parameter derived from the first probability;

    calculating a second probability, the second probability reflecting a likelihood that an arbitrary entity representation in the database comprises a record that is in the part associated with the particular field value;

    linking at least two entity representations in the database based on a second instance of the record matching formula, wherein the second instance of the record matching formula comprises a second parameter derived from the second probability, whereby a number of entity representations in the database is reduced by the linking entity representations relative to a number of entity representations in the database prior to the linking at least two entity representations; and

    retrieving information from at least one record in the database.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×