Methods and Systems for Discovery of Linkage Points Between Data Sources
First Claim
1. A method for linking data records across datasets, the method comprising:
- identifying a plurality of datasets, each dataset comprising at least one data record, each data record associated with an entity and comprising one or more attributes of that entity and a value associated with each attribute;
comparing values associated with attributes across datasets;
identifying matching attributes having values that satisfy a predetermined similarity threshold;
identifying linkage points between pairs of datasets, each linkage point linking one or more pairs of data records, each data record in each pair of data records contained in one of a given pair of datasets and each pair of data records associated with a common entity having matching attributes in the given pair of datasets; and
linking data records associated with the common entities across datasets using the identified linkage points.
2 Assignments
0 Petitions
Accused Products
Abstract
Data records are linked across a plurality of datasets. Each dataset contains at least one data record, and each data record is associated with an entity and includes one or more attributes of that entity and a value for each attribute. Values associated with attributes are compared across datasets, and matching attributes having values that satisfy a predetermined similarity threshold are identified. In addition, linkage points between pairs of datasets are identified. Each linkage point links one or more pairs of data records. Each data record in the pair of data records is contained in one of a given pair of datasets, and each pair of data records is associated with a common entity having matching attributes in the given pair of datasets. Data records associated with the common entities are linked across datasets using the identified linkage points.
-
Citations
19 Claims
-
1. A method for linking data records across datasets, the method comprising:
-
identifying a plurality of datasets, each dataset comprising at least one data record, each data record associated with an entity and comprising one or more attributes of that entity and a value associated with each attribute; comparing values associated with attributes across datasets; identifying matching attributes having values that satisfy a predetermined similarity threshold; identifying linkage points between pairs of datasets, each linkage point linking one or more pairs of data records, each data record in each pair of data records contained in one of a given pair of datasets and each pair of data records associated with a common entity having matching attributes in the given pair of datasets; and linking data records associated with the common entities across datasets using the identified linkage points. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A computer-readable storage medium containing a computer-readable code that when read by a computer causes the computer to perform a method for linking data records across datasets, the method comprising:
-
identifying a plurality of datasets, each dataset comprising at least one data record, each data record associated with an entity and comprising one or more attributes of that entity and a value associated with each attribute; comparing values associated with attributes across datasets; identifying matching attributes having values that satisfy a predetermined similarity threshold; identifying linkage points between pairs of datasets, each linkage point linking one or more pairs of data records, each data record in each pair of data records contained in one of a given pair of datasets and each pair of data records associated with a common entity having matching attributes in the given pair of datasets; and linking data records associated with the common entities across datasets using the identified linkage points. - View Dependent Claims (16, 17, 18, 19)
-
Specification