Statistical record linkage calibration at the field and field value levels without the need for human interaction
First Claim
1. A computer implemented iterative process for generating entity representations in a computer implemented database using a record matching formula and for generating parameters for the record matching formula, the database comprising a plurality of records, each record comprising a plurality of fields, each field capable of containing a field value, wherein at least a portion of parameters for the record matching formula are configured for a particular field value appearing in a selected field of at least one record, the process comprising:
- , using the computer, a field value weight, the field value weight reflecting a first probability that an arbitrary record in the database comprises the particular field value in the selected field of the arbitrary record, wherein the first probability comprises a ratio of records in the database that include the particular field value in the selected field to a total number of records in the database;
forming a plurality of entity representations in the database, each entity representation comprising at least two records linked using a first instance of the record matching formula, at least one entity representation comprising a first record comprising the particular field value, the first record linked to a second record using a first instance of the record matching formula comprising the field value weight;
, using the computer, a revised field value weight, the revised field value weight reflecting a second probability that an arbitrary entity representation in the database comprises the particular field value in the selected field of a record in the arbitrary entity representation, wherein the second probability comprises a ratio of entity representations in the database that include the particular field value in the selected field to a total number of entity representations in the database;
linking at least two entity representations in the database based on a second instance of the record matching formula, wherein the second instance of the record matching formula comprises the revised field value weight, whereby a number of entity representations in the database is reduced by the linking at least two entity representations relative to a number of entity representations in the database prior to the linking at least two entity representations; and
retrieving information from at least one record in the database.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed is a system for, and method of, calculating parameters used to determine whether records and entity representations should be linked. The system and method apply iterative techniques such that parameters from each linking iteration are used in the next linking iteration. The system and method need no human interaction in order to calibrate and utilize record matching formulas used for the linking decisions.
216 Citations
16 Claims
-
1. A computer implemented iterative process for generating entity representations in a computer implemented database using a record matching formula and for generating parameters for the record matching formula, the database comprising a plurality of records, each record comprising a plurality of fields, each field capable of containing a field value, wherein at least a portion of parameters for the record matching formula are configured for a particular field value appearing in a selected field of at least one record, the process comprising:
-
, using the computer, a field value weight, the field value weight reflecting a first probability that an arbitrary record in the database comprises the particular field value in the selected field of the arbitrary record, wherein the first probability comprises a ratio of records in the database that include the particular field value in the selected field to a total number of records in the database; forming a plurality of entity representations in the database, each entity representation comprising at least two records linked using a first instance of the record matching formula, at least one entity representation comprising a first record comprising the particular field value, the first record linked to a second record using a first instance of the record matching formula comprising the field value weight; , using the computer, a revised field value weight, the revised field value weight reflecting a second probability that an arbitrary entity representation in the database comprises the particular field value in the selected field of a record in the arbitrary entity representation, wherein the second probability comprises a ratio of entity representations in the database that include the particular field value in the selected field to a total number of entity representations in the database; linking at least two entity representations in the database based on a second instance of the record matching formula, wherein the second instance of the record matching formula comprises the revised field value weight, whereby a number of entity representations in the database is reduced by the linking at least two entity representations relative to a number of entity representations in the database prior to the linking at least two entity representations; and retrieving information from at least one record in the database. - View Dependent Claims (2, 3, 4)
-
-
5. A computer implemented iterative process for generating entity representations in a computer implemented database using a record matching formula and for generating parameters for the record matching formula, the database comprising a plurality of records, each record comprising a plurality of fields, each field capable of containing a field value, wherein at least a portion of parameters for the record matching formula are configured for a selected field and independent of any field value in the selected field, the process comprising:
-
, using the computer, a plurality of field value weights, each field value weight reflecting a probability that an arbitrary record in the database comprises a different field value in the selected field of the arbitrary record, wherein each probability comprises a ratio of a number of records in the database that include a different field value in the selected field to a total number of records in the database; , using the computer, a field weight for the selected field, the field weight for the selected field derived from each of the plurality of field value weights; forming a plurality of entity representations in the database, each entity representation comprising at least two records linked using a first instance of the record matching formula, at least one entity representation comprising a first record linked to a second record using a first instance of the record matching formula comprising the field weight for the selected field; , using the computer, a plurality of revised field value weights, each revised field value weight reflecting a probability that an arbitrary entity representation in the database comprises a different field value in the selected field of a record in the arbitrary entity representation, wherein each probability comprises a ratio of a number of entity representations in the database that include a different field value in the selected field to a total number of entity representations in the database; , using the computer, a revised field weight for the selected field, the revised field weight for the selected field derived from each of the plurality of revised field value weights; linking at least two entity representations in the database based on a second instance of the record matching formula, wherein the second instance of the record matching formula comprises the revised field value weight, whereby a number of entity representations in the database is reduced by the linking at least two entity representations relative to a number of entity representations in the database prior to the linking at least two entity representations; and retrieving information from at least one record in the database. - View Dependent Claims (6, 7, 8)
-
-
9. A computer system for iteratively generating entity representations in a computer implemented database using a record matching formula and for generating parameters for the record matching formula, the database comprising a plurality of records, each record comprising a plurality of fields, each field capable of containing a field value, wherein at least a portion of parameters for the record matching formula are configured for a particular field value appearing in a selected field of at least one record, the system comprising:
-
a computer implemented database comprising a plurality of records, each record comprising a plurality of fields, each field capable of containing a field value; a processor programmed to calculate a field value weight, the field value weight reflecting a first probability that an arbitrary record in the database comprises the particular field value in the selected field of the arbitrary record, wherein the first probability comprises a ratio of records in the database that include the particular field value in the selected field to a total number of records in the database; a processor programmed to form and store a plurality of entity representations in the database, each entity representation comprising at least two records linked using a first instance of the record matching formula, at least one entity representation comprising a first record comprising the particular field value, the first record linked to a second record using a first instance of the record matching formula comprising the field value weight; a processor programmed to calculate a revised field value weight, the revised field value weight reflecting a second probability that an arbitrary entity representation in the database comprises the particular field value in the selected field of a record in the arbitrary entity representation, wherein the second probability comprises a ratio of entity representations in the database that include the particular field value in the selected field to a total number of entity representations in the database; and a processor programmed to link and store at least two entity representations in the database based on a second instance of the record matching formula, wherein the second instance of the record matching formula comprises the revised field value weight, whereby a number of entity representations in the database is reduced by linking the at least two entity representations relative to a number of entity representations in the database prior to the linking at least two entity representations. - View Dependent Claims (10, 11, 12)
-
-
13. A computer system for iteratively generating entity representations in a computer implemented database using a record matching formula and for generating parameters for the record matching formula, the database comprising a plurality of records, each record comprising a plurality of fields, each field capable of containing a field value, wherein at least a portion of parameters for the record matching formula are configured for a selected field and independent of any field value in the selected field, the system comprising:
-
a computer implemented database comprising a plurality of records, each record comprising a plurality of fields, each field capable of containing a field value; a processor programmed to calculate a plurality of field value weights, each field value weight reflecting a probability that an arbitrary record in the database comprises a different field value in the selected field of the arbitrary record, wherein each probability comprises a ratio of a number of records in the database that include a different field value in the selected field to a total number of records in the database; a processor programmed to calculate a field weight for the selected field, the field weight for the selected field derived from each of the plurality of field value weights; a processor programmed to form and store a plurality of entity representations in the database, each entity representation comprising at least two records linked using a first instance of the record matching formula, at least one entity representation comprising a first record linked to a second record using a first instance of the record matching formula comprising the field weight for the selected field; a processor programmed to calculate a plurality of revised field value weights, each revised field value weight reflecting a probability that an arbitrary entity representation in the database comprises a different field value in the selected field of a record in the arbitrary entity representation, wherein the second probability comprises a ratio of entity representations in the database that include the particular field value in the selected field to a total number of entity representations in the database; a processor programmed to calculate a revised field weight for the selected field, the revised field weight for the selected field derived from each of the plurality of revised field value weights; and a processor programmed to link and store at least two entity representations in the database based on a second instance of the record matching formula, wherein the second instance of the record matching formula comprises the revised field value weight, whereby a number of entity representations in the database is reduced by the linking at least two entity representations relative to a number of entity representations in the database prior to the linking at least two entity representations. - View Dependent Claims (14, 15, 16)
-
Specification