System and method for record linkage
First Claim
1. One or more computer-readable storage devices having computer-usable instructions embodied thereon that, when executed by a processor, perform a method of determining that a plurality of health records are related to the same human patient, the method comprising:
- receiving a target record from a first health-records system, the target record comprising a date-time variable associated with the target record;
receiving one or more candidate records from a second health-records system, the one or more candidate records comprising information of a plurality of episodes associated with a candidate patient, said information including a date-time value associated with each episode; and
for the one or more candidate records;
based on said date-time variable associated with each episode, determining a timeseries of time intervals representing a time between each episode, and further determining a time duration between each date-time variable associated with each candidate record episode and the target record date-time variable;
determining a normalized power-spectrum likelihood weight (“
power spectra weight”
) based on the timeseries of time intervals;
determining a record linkage weight based on a measure of lexical similarity between the one or more candidate records and the target record;
based on the determined record linkage weight and power spectra weight, determining that the one or more candidate records are related to the target record; and
storing an indication that the one or more candidate records are to be linked to the target record.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, systems, and computer-readable media are provided for facilitating record matching and entity resolution and for enabling improvements in record linkage. A power-spectrum-based temporal pattern-specific weight may be incorporated into record linkage methods to enhance the record linkage accuracy and statistical performance. For example, in embodiments, a value-specific weight may be calculated from a population-based frequency of field-specific values and provides an opportunity to capture and measure the relative importance of specific values found in a field. A timeseries-derived Bayesian power spectrum weight may be calculated from the population-based frequency of temporal pattern-specific values in terms of intensities at various frequencies of the power spectrum computed from the timeseries, and further provides an opportunity to capture and measure the relative importance of specific sequences of care episodes.
211 Citations
19 Claims
-
1. One or more computer-readable storage devices having computer-usable instructions embodied thereon that, when executed by a processor, perform a method of determining that a plurality of health records are related to the same human patient, the method comprising:
-
receiving a target record from a first health-records system, the target record comprising a date-time variable associated with the target record; receiving one or more candidate records from a second health-records system, the one or more candidate records comprising information of a plurality of episodes associated with a candidate patient, said information including a date-time value associated with each episode; and for the one or more candidate records; based on said date-time variable associated with each episode, determining a timeseries of time intervals representing a time between each episode, and further determining a time duration between each date-time variable associated with each candidate record episode and the target record date-time variable; determining a normalized power-spectrum likelihood weight (“
power spectra weight”
) based on the timeseries of time intervals;determining a record linkage weight based on a measure of lexical similarity between the one or more candidate records and the target record; based on the determined record linkage weight and power spectra weight, determining that the one or more candidate records are related to the target record; and storing an indication that the one or more candidate records are to be linked to the target record. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer-implemented method of determining related records, the method comprising:
-
receiving a reference record from a first records system, the reference record including a reference record date; receiving one or more candidate records from a second records system, each candidate record associated with an entity and comprising information of a plurality of episodes, each episode associated with an episode date; and for each of the one or more candidate records; based on the episode date associated with each episode, determining a timeseries comprising time intervals, the time intervals representing a time between each episode date and a time between each episode date of the candidate record and the reference record date; determining a normalized power-spectrum likelihood weight (“
power spectra weight”
) based on the timeseries;determining a record linkage weight based on a measure of lexical similarity between the one or more candidate records and the reference record; based on the determined power spectra weight and record linkage weight, designating the candidate record as related to the reference record; and storing an indication that the one or more candidate records are to be linked to the reference record. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A computer-implemented method of determining related records, the method comprising:
-
receiving a plurality of records from at least one database, each record comprising information of one or more episodes, said information including a date-time value associated with each episode; determining a value of a linkage indicator variable, wherein the value of the linkage indicator variable indicates whether information in a first record is associated with information in one or more other records in the at least one database; based on the linkage indicator variable, determining a subset of records from the plurality of records that are plausibly related to the first record; for the subset of plausibly related records and based on said date-time variable associated with each episode, determining a timeseries of time intervals representing a time between each episode, and further determining a time duration between each date-time variable associated with each candidate record episode and the target record date-time variable; determining a normalized power-spectrum likelihood weight (“
power spectra weight”
) for each record in the subset based on the timeseries of time intervals;determining a record linkage weight for each record in the subset and the first record based on a measure of lexical similarity between the records; based on the determined power spectra weight and record linkage weight, identifying the first record as related to one or more records of the subset of records; and storing an indication that the one or more candidate records are to be linked to the target record. - View Dependent Claims (19)
-
Specification