STATISTICAL RECORD LINKAGE CALIBRATION FOR GEOGRAPHIC PROXIMITY MATCHING
First Claim
1. A method for generating entity representations in a computer implemented database using a record matching formula and for generating parameters for the record matching formula, the database comprising a plurality of records, each record comprising a plurality of fields, each field capable of containing a field value, wherein at least a portion of parameters for the record matching formula are generated using a symmetric and reflexive function and configured for a particular field value appearing in a selected field of at least one record, and wherein the process provides for linking records or entity representations with non-identical field values, the method comprising the steps of:
- forming a notional grid over a geographic area of interest that contains a plurality of points, wherein the geographic area of interest comprises a plurality of squares and wherein each of the plurality of points is associated with one of the plurality of records in the database that contains absolute geographic location information;
calculating a plurality of match probabilities, wherein each match probability reflects a likelihood that an arbitrary point in the geographic area of interest lies within a distance of a square within which the arbitrary point lies as determined by the symmetric and reflexive function;
calculating a plurality of match weights based on the plurality of match probabilities;
linking at least two entity representations in the database based on one or more of the plurality of match weights using the record matching formula; and
retrieving information from at least one record in the database.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed is a system for, and method of, calculating parameters used to determine whether records and entity representations should be linked. The system and method use a symmetric and reflexive function to allow for linking records and entity representations whose field values differ. The system and method apply iterative techniques such that parameters from each linking iteration are used in the next linking iteration. The system and method need no human interaction in order to calibrate and utilize record matching formulas used for the linking decisions. These techniques may be used for geographic location proximity matching.
-
Citations
18 Claims
-
1. A method for generating entity representations in a computer implemented database using a record matching formula and for generating parameters for the record matching formula, the database comprising a plurality of records, each record comprising a plurality of fields, each field capable of containing a field value, wherein at least a portion of parameters for the record matching formula are generated using a symmetric and reflexive function and configured for a particular field value appearing in a selected field of at least one record, and wherein the process provides for linking records or entity representations with non-identical field values, the method comprising the steps of:
-
forming a notional grid over a geographic area of interest that contains a plurality of points, wherein the geographic area of interest comprises a plurality of squares and wherein each of the plurality of points is associated with one of the plurality of records in the database that contains absolute geographic location information; calculating a plurality of match probabilities, wherein each match probability reflects a likelihood that an arbitrary point in the geographic area of interest lies within a distance of a square within which the arbitrary point lies as determined by the symmetric and reflexive function; calculating a plurality of match weights based on the plurality of match probabilities; linking at least two entity representations in the database based on one or more of the plurality of match weights using the record matching formula; and retrieving information from at least one record in the database. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system for generating entity representations in a computer implemented database using a record matching formula and for generating parameters for the record matching formula, the database comprising a plurality of records, each record comprising a plurality of fields, each field capable of containing a field value, wherein at least a portion of parameters for the record matching formula are generated using a symmetric and reflexive function and configured for a particular field value appearing in a selected field of at least one record, and wherein the process provides for linking records or entity representations with non-identical field values, the system comprising:
-
a processor programmed to form a notional grid over a geographic area of interest that contains a plurality of points, wherein the geographic area of interest comprises a plurality of squares and wherein each of the plurality of points is associated with one of the plurality of records in the database that contains absolute geographic location information; a processor programmed to calculate a plurality of match probabilities, wherein each match probability reflects a likelihood that an arbitrary point in the geographic area of interest lies within a distance of a square within which the arbitrary point lies as determined by the symmetric and reflexive function; a processor programmed to calculate a plurality of match weights based on the plurality of match probabilities; a processor programmed to link at least two entity representations in the database based on one or more of the plurality of match weights using the record matching formula; and a processor programmed to retrieve information from at least one record in the database. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification