Method for record linkage from multiple sources
First Claim
1. A computer implemented method of association of data records, comprising:
- receiving a first specification for a distance measure for each attribute in a subset of attributes of data records in one or more databases where each said distance measure is between zero and one inclusive;
receiving a second specification for a set of non-negative weights for each attribute in said subset of attributes where the sum of said set of weights is equal to one;
receiving a third specification for one or more threshold values;
calculating a pairwise record distance between a plurality of considered records, wherein the pairwise record distance is the weighted sum based on said non-negative weights and each said distance measure for the entire subset of said subset of attributes, and wherein the pairwise record distance falls between zero and one inclusive;
associating a subset of said considered records where said pairwise record distance for said subset of records is within the applicable said threshold value; and
measuring physical distance between record items based on latitude and longitude, and wherein said physical distance is normalized to be compliant.
1 Assignment
0 Petitions
Accused Products
Abstract
Embodiments of methods for record linkage and comparing attributes are presented herein. Broadly speaking, embodiments of the present invention associate data records using distance measures and weights. More particularly, embodiments of the present invention generate a weight-based comparison of attributes. More specifically, embodiments of the present invention involve deduplication of records in a single database and linkage of records in multiple databases. In some embodiments of the invention, records may be merged into a master database based on the linkage and comparison of attributes. In addition, embodiments of the present invention may calculate a quality measure to be used in comparing attributes and records.
78 Citations
16 Claims
-
1. A computer implemented method of association of data records, comprising:
-
receiving a first specification for a distance measure for each attribute in a subset of attributes of data records in one or more databases where each said distance measure is between zero and one inclusive; receiving a second specification for a set of non-negative weights for each attribute in said subset of attributes where the sum of said set of weights is equal to one; receiving a third specification for one or more threshold values; calculating a pairwise record distance between a plurality of considered records, wherein the pairwise record distance is the weighted sum based on said non-negative weights and each said distance measure for the entire subset of said subset of attributes, and wherein the pairwise record distance falls between zero and one inclusive; associating a subset of said considered records where said pairwise record distance for said subset of records is within the applicable said threshold value; and measuring physical distance between record items based on latitude and longitude, and wherein said physical distance is normalized to be compliant. - View Dependent Claims (2, 3, 4)
-
-
5. A computer implemented method of association of data records, comprising:
-
receiving a first specification for a distance measure for each attribute in a subset of attributes of data records in one or more databases where each said distance measure is between zero and one inclusive; receiving a second specification for a set of non-negative weights for each attribute in said subset of attributes where the sum of said set of weights is equal to one; receiving a third specification for one or more threshold values; calculating a pairwise record distance between a plurality of considered records, wherein the pairwise record distance is the weighted sum based on said non-negative weights and each said distance measure for the entire subset of said subset of attributes, and wherein the pairwise record distance falls between zero and one inclusive; associating a subset of said considered records where said pairwise record distance for said subset of records is within the applicable said threshold value; and standardizing text attributes in a first stage and a second stage, wherein said first stage is standardizing character-level matching considerations, and wherein said second stage is standardizing content of each attribute.
-
-
6. A computer implemented method of association of data records, comprising:
-
receiving a first specification for a distance measure for each attribute in a subset of attributes of data records in one or more databases where each said distance measure is between zero and one inclusive; receiving a second specification for a set of non-negative weights for each attribute in said subset of attributes where the sum of said set of weights is equal to one; receiving a third specification for one or more threshold values; calculating a pairwise record distance between a plurality of considered records, wherein the pairwise record distance is the weighted sum based on said non-negative weights and each said distance measure for the entire subset of said subset of attributes, and wherein the pairwise record distance falls between zero and one inclusive; associating a subset of said considered records where said pairwise record distance for said subset of records is within the applicable said threshold value; and wherein said threshold value differs based on how many said records are found similar using key parameters.
-
-
7. A computer implemented method of association of data records, comprising:
-
receiving a first specification for a distance measure for each attribute in a subset of attributes of data records in one or more databases where each said distance measure is between zero and one inclusive; receiving a second specification for a set of non-negative weights for each attribute in said subset of attributes where the sum of said set of weights is equal to one; receiving a third specification for one or more threshold values; calculating a pairwise record distance between a plurality of considered records, wherein the pairwise record distance is the weighted sum based on said non-negative weights and each said distance measure for the entire subset of said subset of attributes, and wherein the pairwise record distance falls between zero and one inclusive; associating a subset of said considered records where said pairwise record distance for said subset of records is within the applicable said threshold value; receiving a fourth specification for a database quality measure; receiving a fifth specification for a record quality measure; calculating an entry quality measure for each attribute as the product of said database quality measure and said record quality measure; calculating a final quality measure for each unique attribute value as the sum of said entry qualities for each identical attribute value; and selecting one of the unique attribute values for merger into a record in a database, wherein the unique attribute value selected is based on a comparison of said final quality measures. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A computer implemented method of association of data records, comprising:
-
receiving a first specification for a distance measure for each attribute in a subset of attributes of data records in one or more databases where each said distance measure is between zero and one inclusive; receiving a second specification for a set of non-negative weights for each attribute in said subset of attributes where the sum of said set of weights is equal to one; receiving a third specification for one or more threshold values; calculating a pairwise record distance between a plurality of considered records, wherein the pairwise record distance is the weighted sum based on said non-negative weights and each said distance measure for the entire subset of said subset of attributes, and wherein the pairwise record distance falls between zero and one inclusive; associating a subset of said considered records where said pairwise record distance for said subset of records is within the applicable said threshold value; receiving a fourth specification for a database quality measure; receiving a fifth specification for one or more attribute quality measures; calculating an entry quality measure for each attribute as the product of said database quality measure and said attribute quality measure; calculating a final quality measure for each unique attribute value as the sum of said entry qualities for each identical attribute value; selecting one of the unique attribute values for merger into a record in a database, wherein the unique attribute value selected is based on a comparison of said final quality measures; and merging said unique attribute value into a record in a database base. - View Dependent Claims (14)
-
-
15. A computer readable non-transitory storage medium comprising instructions executable by a processor for:
-
receiving a first specification for a distance measure for each attribute in a subset of attributes of data records in one or more databases where each said distance measure is between zero and one inclusive; receiving a second specification for a set of non-negative weights for each attribute in said subset of attributes where the sum of said set of weights is equal to one; receiving a third specification for one or more threshold values; calculating a pairwise record distance between a plurality of considered records, wherein the pairwise record distance is the weighted sum based on said non-negative weights and each said distance measure for the entire subset of said subset of attributes, and wherein the pairwise record distance falls between zero and one inclusive; associating a subset of said considered records where said pairwise record distance for said subset of records is within the applicable said threshold value; receiving a fourth specification for a database quality measure; receiving a fifth specification for a record quality measure; calculating an entry quality measure for each attribute as the product of said database quality measure and said record quality measure; calculating a final quality measure for each unique attribute value as the sum of said entry qualities for each identical attribute value; and selecting one of the unique attribute values for merger into a record in a database, wherein the unique attribute value selected is based on a comparison of said final quality measures.
-
-
16. A computer readable non-transitory storage medium comprising instructions executable by a processor for:
-
receiving a first specification for a distance measure for each attribute in a subset of attributes of data records in one or more databases where each said distance measure is between zero and one inclusive; receiving a second specification for a set of non-negative weights for each attribute in said subset of attributes where the sum of said set of weights is equal to one; receiving a third specification for one or more threshold values; calculating a pairwise record distance between a plurality of considered records, wherein the pairwise record distance is the weighted sum based on said non-negative weights and each said distance measure for the entire subset of said subset of attributes, and wherein the pairwise record distance falls between zero and one inclusive; associating a subset of said considered records where said pairwise record distance for said subset of records is within the applicable said threshold value; receiving a fourth specification for a database quality measure; receiving a fifth specification for one or more attribute quality measures; calculating an entry quality measure for each attribute as the product of said database quality measure and said attribute quality measure; calculating a final quality measure for each unique attribute value as the sum of said entry qualities for each identical attribute value; selecting one of the unique attribute values for merger into a record in a database, wherein the unique attribute value selected is based on a comparison of said final quality measures; and merging said unique attribute value into a record in a database base.
-
Specification