System and method for record linkage
First Claim
1. One or more computer-readable storage devices having computer-usable instructions embodied thereon that, when executed, enable a given processor to perform a method of determining that a plurality of health records are related to the same human patient, the method comprising:
- receiving a target record from a first health-records system, the target record comprising at least one target record blocking variable and a date-time variable associated with the target record;
receiving one or more candidate records from a second health-records system, each candidate record comprising at least one candidate record blocking variable and information of a plurality of episodes associated with a candidate patient, said information including a date-time value associated with each episode of the plurality of episodes, wherein the at least one candidate record blocking variable is the same as the at least one target record blocking variable and comprises a birth day, birth month, birth year, or an indication of a condition, treatment, diagnosis, or other context; and
for each candidate record;
(1) based on said date-time variable associated with each episode, determining a timeseries of time intervals representing the time between each episode and a time duration from the date-time variable associated with a last candidate record episode to the target record date-time variable;
(2) for each timeseries, determining a normalized power-spectrum likelihood weight (“
power spectra weight”
), based on the timeseries;
(3) determining a record linkage weight based on a measure of lexical similarity between the candidate record and target record;
(4) based on the determined record linkage weight and power spectra weight, determining a composite candidate record score using a root-mean-square transformation, a cosine transformation, or correlation coefficient;
(5) performing a comparison of the candidate record score to a threshold;
(6) based on the comparison, determining that the candidate record score satisfies the threshold, designating the candidate record as related to the target record, adding an indication of the target record to the candidate record thereby creating an updated candidate record, and storing the updated candidate record in the first or second health-records system.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods, systems, and computer-readable media are provided for facilitating record matching and entity resolution and for enabling improvements in record linkage. A power-spectrum-based temporal pattern-specific weight may be incorporated into record linkage methods to enhance the record linkage accuracy and statistical performance. For example, in embodiments, a value-specific weight may be calculated from a population-based frequency of field-specific values and provides an opportunity to capture and measure the relative importance of specific values found in a field. A timeseries-derived Bayesian power spectrum weight may be calculated from the population-based frequency of temporal pattern-specific values in terms of intensities at various frequencies of the power spectrum computed from the timeseries, and further provides an opportunity to capture and measure the relative importance of specific sequences of care episodes.
165 Citations
19 Claims
-
1. One or more computer-readable storage devices having computer-usable instructions embodied thereon that, when executed, enable a given processor to perform a method of determining that a plurality of health records are related to the same human patient, the method comprising:
-
receiving a target record from a first health-records system, the target record comprising at least one target record blocking variable and a date-time variable associated with the target record; receiving one or more candidate records from a second health-records system, each candidate record comprising at least one candidate record blocking variable and information of a plurality of episodes associated with a candidate patient, said information including a date-time value associated with each episode of the plurality of episodes, wherein the at least one candidate record blocking variable is the same as the at least one target record blocking variable and comprises a birth day, birth month, birth year, or an indication of a condition, treatment, diagnosis, or other context; and for each candidate record; (1) based on said date-time variable associated with each episode, determining a timeseries of time intervals representing the time between each episode and a time duration from the date-time variable associated with a last candidate record episode to the target record date-time variable; (2) for each timeseries, determining a normalized power-spectrum likelihood weight (“
power spectra weight”
), based on the timeseries;(3) determining a record linkage weight based on a measure of lexical similarity between the candidate record and target record; (4) based on the determined record linkage weight and power spectra weight, determining a composite candidate record score using a root-mean-square transformation, a cosine transformation, or correlation coefficient; (5) performing a comparison of the candidate record score to a threshold; (6) based on the comparison, determining that the candidate record score satisfies the threshold, designating the candidate record as related to the target record, adding an indication of the target record to the candidate record thereby creating an updated candidate record, and storing the updated candidate record in the first or second health-records system. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A computer-implemented method of determining related records, the method comprising:
-
receiving a reference record from a first records system, the reference record including a reference record date; receiving one or more candidate records from a second records system, each candidate record associated with an entity and comprising information of a plurality of episodes, each episode associated with an episode date; and for each candidate record; (1) based on the episode date associated with each episode, determining a timeseries comprising time intervals, the time intervals representing the time between each episode date and the time between a last episode date of the candidate record and the reference record date; (2) for each timeseries, determining a normalized power-spectrum likelihood weight (“
power spectra weight”
), based on the timeseries;(3) based on the determined power spectra weight, determining a candidate record score; (4) performing a comparison of the candidate record score to a threshold; and (5) based on the comparison determining that the candidate record score satisfies the threshold, designating the candidate record as related to the reference record and adding an indication of the reference record to the candidate record thereby creating an updated candidate record. - View Dependent Claims (14, 15, 16, 17, 18)
-
-
19. A computer-implemented method of determining related records, the method comprising:
-
receiving a plurality or records from at least one database, each record comprising at least one blocking variable and information of one or more episodes, wherein date-time information is associated with each episode, and wherein each blocking variable comprises a birth day, birth month, birth year, or indication of a condition, treatment, diagnosis, or other context; for each record, determining a value of a linkage indicator variable, wherein the value of the linkage indicator variable indicates whether information in the record is associated with information in one or more other records in the at least one database; based on the record linkage variable and blocking variable of each record, determining a subset of plausibly related records from the plurality of records; determining a normalized power-spectrum likelihood weight (“
power spectra weight”
) for each record in the subset;determining a record linkage weight for each record in the subset based on a measure of lexical similarity; determining a composite score based on the determined power spectra weight and record linkage weight and using a root-mean-square transformation, a cosine transformation, or correlation coefficient; comparing the composite score to a threshold value; and
if the composite score is greater than the threshold value, identifying a record as a related record and adding a related-record indication to the related record, but if the composite score is less than the threshold, identifying the record as not related.
-
Specification