System and method for data integration using multi-dimensional, associative unique identifiers
First Claim
Patent Images
1. A computer implemented method including a plurality of data objects stored in a plurality databases, a method implemented in instructions executed by a computer processor for processing data stored on a database, the method comprising:
- associating each a data object in the plurality of data objects in the plurality of databases with a data object ontology corresponding to a hierarchy data object of dimensions of the data object, wherein the hierarchy comprises having at least one source dimension; and
one or more target dimensions dependent on each source dimension and wherein each dimension associated with a specific attribute data; and
for each data object of the plurality of data objects;
based upon values of specific attribute data of a subset of data object dimensions associated with the data object comprising a source dimension and one or more dependent target dimensions;
providing the unique identifier to identify the data object;
calculating and associating a unique identifier for each data object based on applying a hashing algorithm executed by the computer processor to data stored on the database to a selected set of the objects'"'"' dimensions having the highest affinity metric values, and wherein the affinity metric for each dimension is calculated as a weighted sum of temporal invariance of a dimension and a uniqueness metric, wherein;
the temporal invariance of the object dimension is calculated as the minimum of the temporal invariance of any value appearing in the object dimension, the temporal invariance being calculated as the ratio of the number of times the value of the object dimension has changed over the total number of times the value of the dimension is observed in the databases over a period of time; and
the uniqueness metric capturing whether data values in an object dimension are unique.
5 Assignments
0 Petitions
Accused Products
Abstract
A system and method for associating data objects utilizing unique identifiers is provided. Data objects are modeled utilizing a data object ontology. Unique identifiers for instances of each data object are calculated utilizing a selection of unique attributes of the data object ontology. Data objects from multiple data sources can be integrated utilizing the unique identifiers for each data object.
-
Citations
21 Claims
-
1. A computer implemented method including a plurality of data objects stored in a plurality databases, a method implemented in instructions executed by a computer processor for processing data stored on a database, the method comprising:
-
associating each a data object in the plurality of data objects in the plurality of databases with a data object ontology corresponding to a hierarchy data object of dimensions of the data object, wherein the hierarchy comprises having at least one source dimension; and one or more target dimensions dependent on each source dimension and wherein each dimension associated with a specific attribute data; and for each data object of the plurality of data objects;
based upon values of specific attribute data of a subset of data object dimensions associated with the data object comprising a source dimension and one or more dependent target dimensions;providing the unique identifier to identify the data object; calculating and associating a unique identifier for each data object based on applying a hashing algorithm executed by the computer processor to data stored on the database to a selected set of the objects'"'"' dimensions having the highest affinity metric values, and wherein the affinity metric for each dimension is calculated as a weighted sum of temporal invariance of a dimension and a uniqueness metric, wherein; the temporal invariance of the object dimension is calculated as the minimum of the temporal invariance of any value appearing in the object dimension, the temporal invariance being calculated as the ratio of the number of times the value of the object dimension has changed over the total number of times the value of the dimension is observed in the databases over a period of time; and the uniqueness metric capturing whether data values in an object dimension are unique. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer implemented method including a plurality of data objects stored in a plurality of databases, a method implemented in instructions executed by a computer processor for processing data stored on a database, the method comprising:
-
associating each data object in the plurality of databases with a data object ontology corresponding to a hierarchy of data object dimensions of the data object, wherein the hierarchy comprising at least one source dimension and one or more target dimensions dependent on each source dimension and wherein each dimension associated with a specific attribute data and wherein the data object ontology includes a invariance strength identifier computed using a hashing function for a source and target dimension in the hierarchy of data object dimensions executed by the computer processor; and for each of the plurality of data objects;
calculating a unique identifier for a data object based upon values of specific attribute data of a subset of data object dimensions associated with the data object, said dimensions comprising a source dimension and one or more dependent target dimensions having a high affinity metric; andproviding the unique identifier to identify the data object; wherein the affinity metric for each dimension is calculated as a weighted sum of a temporal invariance of a dimension and a uniqueness metric, wherein;
the temporal invariance of the dimension is calculated as the minimum of the temporal invariance of any value appearing in the dimension, the temporal invariance being calculated as the ratio of the number of times the value of the dimension has changed over the total number of times the value of the dimension is observed in a period of time; andthe uniqueness metric capturing whether data values in the dimension are unique. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A computing system comprised of modules stored on a database including a plurality of data objects modules for processing data objects stored in databases, the system comprising:
-
a processor; a plurality of databases including data object dimension data corresponding to the plurality of data objects; a data object ontology corresponding to a hierarchy of data object dimensions, wherein the hierarchy comprises at least one source dimension and one or more target dimensions dependent on each source dimension and wherein each dimension is associated with a specific attribute data of each data object; and a data integration application implemented in instructions accessed from the database and executed by a computer processor configured to; obtain the data object dimension data; and
, for each data object;calculate a unique identifier for a data object based upon the temporal invariance and uniqueness of the values of specific attribute data of a subset of data object dimensions associated with the data object comprising a source dimension and one or more dependent target dimensions; wherein the temporal invariance of an object dimension is calculated as the minimum of the temporal invariance of any value appearing in the object dimension, the temporal invariance being calculated as the ratio of the number of times the value of the dimension has changed over the total number of times the value of the dimension is observed in a period of time; and integrate the data object dimension data according to the calculated unique identifier; and provide the unique identifier to identify the data object. - View Dependent Claims (17, 18, 19, 20, 21)
-
Specification