System, service, and method for automatically discovering universal data objects
First Claim
1. A method of automatically discovering a plurality of universal data objects, comprising:
- generating an object graph from a set of source schemas, a plurality of similarities between objects in the set of source schemas, and a plurality of additional metadata describing the set of source schemas;
calculating a degree of sharing score for a plurality of objects in the object graph;
selecting a plurality of candidate universal data objects from the objects in the object graph;
clustering the candidate universal data objects to select a plurality of universal data objects; and
merging the selected universal data objects to allow sharing of data between the set of source schemas.
1 Assignment
0 Petitions
Accused Products
Abstract
A universal data object discovery system automatically identifies candidate universal data objects, ranks the candidate universal data objects according to predetermined criteria, and merges source schemas into unified universal data objects within a set of data sources. From data inputs and a set of control parameters, the system computes a degree of sharing score for composite structures in the source schemas. The data inputs comprise source schemas, similarity values for data structures, and foreign key relationships. The system identifies as candidate universal data objects those structures whose degree of sharing score exceeds a threshold. The system calculates a similarity between candidate universal data objects and merges candidate universal data objects that are similar. The merged universal data objects are the output of the system.
-
Citations
20 Claims
-
1. A method of automatically discovering a plurality of universal data objects, comprising:
-
generating an object graph from a set of source schemas, a plurality of similarities between objects in the set of source schemas, and a plurality of additional metadata describing the set of source schemas;
calculating a degree of sharing score for a plurality of objects in the object graph;
selecting a plurality of candidate universal data objects from the objects in the object graph;
clustering the candidate universal data objects to select a plurality of universal data objects; and
merging the selected universal data objects to allow sharing of data between the set of source schemas. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14, 15, 16)
-
- 10. The method of claim wherein selecting candidate universal data objects comprises filtering objects with respect to control parameters.
-
17. A system for automatically discovering a plurality of universal data objects, comprising:
-
a schema processing module for generating an object graph from a set of source schemas, a plurality of similarities between objects in the set of source schemas, and a plurality of additional metadata describing the set of source schemas;
the schema processing module further calculating a degree of sharing score for a plurality of objects in the object graph;
a selection module for selecting a plurality of candidate universal data objects from the objects in the object graph;
a clustering module for clustering the candidate universal data objects to select a plurality of universal data objects; and
a merging module for merging the selected universal data objects to allow sharing of data between the set of source schemas. - View Dependent Claims (18)
-
-
19. A computer program product having a plurality of executable instruction codes embedded on a computer-readable medium, for automatically discovering a plurality of universal data objects, comprising:
-
a first set of instruction codes for generating an object graph from a set of source schemas, a plurality of similarities between objects in the set of source schemas, and a plurality of additional metadata describing the set of source schemas;
a second set of instruction codes for calculating a degree of sharing score for a plurality of objects in the object graph;
a third set of instruction codes for selecting a plurality of candidate universal data objects from the objects in the object graph;
a fourth set of instruction codes for clustering the candidate universal data objects to select a plurality of universal data objects; and
a fifth set of instruction codes for merging the selected universal data objects to allow sharing of data between the set of source schemas.
-
-
20. A method of providing a service for automatically discovering a plurality of universal data objects, comprising:
-
specifying a set of data sources for which universal data objects are identified;
specifying a set of control parameters and additional metadata;
invoking an automatic universal data object discovery utility, wherein the specified set of data sources, the specified control parameters, and the additional metadata are made available to the automatic universal data object discovery utility for consideration; and
receiving an object graph with identified universal data objects from the automatic universal data object discovery utility.
-
Specification