AUTOMATED CALIBRATION OF NEGATIVE FIELD WEIGHTING WITHOUT THE NEED FOR HUMAN INTERACTION
First Claim
1. A computer implemented iterative process for generating entity representations in a computer implemented database using a record matching formula and for generating parameters for the record matching formula, the database comprising a plurality of records, each record comprising a plurality of fields, each field capable of containing a field value, the process comprising:
- calculating a field weight for a selected field, the field weight for the selected field derived from each of a plurality of field value weights for the selected field;
forming a plurality of entity representations in the database, each entity representation comprising at least two records linked using a first instance of the record matching formula, at least one entity representation comprising a first record linked to a second record using a first instance of the record matching formula wherein the first record comprises a different field value in its selected field than that of the second record, the first instance of the record matching formula comprising a negative of the field weight for the selected field;
calculating a weight parameter for the selected field, the weight parameter for the selected field reflecting a likelihood that an arbitrary entity representation in the database comprises two different records each comprising a different field value in its respective selected field, the weight parameter being a negative number;
linking at least two entity representations in the database based on a second instance of the record matching formula, wherein the second instance of the record matching formula comprises the weight parameter, whereby a number of entity representations in the database is reduced by the linking at least two entity representations relative to a number of entity representations in the database prior to the linking at least two entity representations; and
retrieving information from at least one record in the database.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed is a system for, and method of, calculating parameters used to determine whether records and entity representations should be linked. Such parameters may be set as negative to account for fields that do not match. The system and method apply iterative techniques such that parameters from each linking iteration are used in the next linking iteration. The system and method need no human interaction in order to calibrate and utilize record matching formulas used for the linking decisions.
-
Citations
10 Claims
-
1. A computer implemented iterative process for generating entity representations in a computer implemented database using a record matching formula and for generating parameters for the record matching formula, the database comprising a plurality of records, each record comprising a plurality of fields, each field capable of containing a field value, the process comprising:
-
calculating a field weight for a selected field, the field weight for the selected field derived from each of a plurality of field value weights for the selected field; forming a plurality of entity representations in the database, each entity representation comprising at least two records linked using a first instance of the record matching formula, at least one entity representation comprising a first record linked to a second record using a first instance of the record matching formula wherein the first record comprises a different field value in its selected field than that of the second record, the first instance of the record matching formula comprising a negative of the field weight for the selected field; calculating a weight parameter for the selected field, the weight parameter for the selected field reflecting a likelihood that an arbitrary entity representation in the database comprises two different records each comprising a different field value in its respective selected field, the weight parameter being a negative number; linking at least two entity representations in the database based on a second instance of the record matching formula, wherein the second instance of the record matching formula comprises the weight parameter, whereby a number of entity representations in the database is reduced by the linking at least two entity representations relative to a number of entity representations in the database prior to the linking at least two entity representations; and retrieving information from at least one record in the database. - View Dependent Claims (2, 3, 4, 5, 9)
-
-
6. A computer system for iteratively generating entity representations in a computer implemented database using a record matching formula and for generating parameters for the record matching formula, the database comprising a plurality of records, each record comprising a plurality of fields, each field capable of containing a field value, the system comprising:
-
a database comprising a plurality of records, each record comprising a plurality of fields, each field capable of containing a field value; a processor programmed to calculate a field weight for a selected field, the field weight for the selected field derived from each of a plurality of field value weights for the selected field; a processor programmed to form and store a plurality of entity representations in the database, each entity representation comprising at least two records linked using a first instance of the record matching formula, at least one entity representation comprising a first record linked to a second record using a first instance of the record matching formula wherein the first record comprises a different field value in its selected field than that of the second record, the first instance of the record matching formula comprising a negative of the field weight for the selected field; a processor programmed to calculate a weight parameter for the selected field, the weight parameter for the selected field reflecting a likelihood that an arbitrary entity representation in the database comprises two different records each comprising a different field value in its respective selected field, the weight parameter being a negative number; and a processor programmed to link and store at least two entity representations in the database based on a second instance of the record matching formula, wherein the second instance of the record matching formula comprises the weight parameter, whereby a number of entity representations in the database is reduced by the linking at least two entity representations relative to a number of entity representations in the database prior to the linking at least two entity representations. - View Dependent Claims (7, 8, 10)
-
Specification