×

Statistical record linkage calibration at the field and field value levels without the need for human interaction

  • US 8,135,719 B2
  • Filed: 04/24/2009
  • Issued: 03/13/2012
  • Est. Priority Date: 04/24/2008
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented iterative process for generating entity representations in a computer implemented database using a record matching formula and for generating parameters for the record matching formula, the database comprising a plurality of records, each record comprising a plurality of fields, each field capable of containing a field value, wherein at least a portion of parameters for the record matching formula are configured for a particular field value appearing in a selected field of at least one record, the process comprising:

  • , using the computer, a field value weight, the field value weight reflecting a first probability that an arbitrary record in the database comprises the particular field value in the selected field of the arbitrary record, wherein the first probability comprises a ratio of records in the database that include the particular field value in the selected field to a total number of records in the database;

    forming a plurality of entity representations in the database, each entity representation comprising at least two records linked using a first instance of the record matching formula, at least one entity representation comprising a first record comprising the particular field value, the first record linked to a second record using a first instance of the record matching formula comprising the field value weight;

    , using the computer, a revised field value weight, the revised field value weight reflecting a second probability that an arbitrary entity representation in the database comprises the particular field value in the selected field of a record in the arbitrary entity representation, wherein the second probability comprises a ratio of entity representations in the database that include the particular field value in the selected field to a total number of entity representations in the database;

    linking at least two entity representations in the database based on a second instance of the record matching formula, wherein the second instance of the record matching formula comprises the revised field value weight, whereby a number of entity representations in the database is reduced by the linking at least two entity representations relative to a number of entity representations in the database prior to the linking at least two entity representations; and

    retrieving information from at least one record in the database.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×