DATA VIRTUALIZATION ACROSS HETEROGENEOUS FORMATS
First Claim
1. A method for virtualizing data across heterogeneous formats, the method comprising:
- receiving, as input, a plurality of heterogeneous data sources;
generating, for each of the plurality of heterogeneous data sources, a local schema graph comprising a set of attribute nodes and a set of type nodes, wherein an attribute node corresponds to a schema element in the heterogeneous data source comprising a domain with at least one value and is annotated with the value in the local schema graph, and wherein a type node corresponds to a schema element in the heterogeneous data source whose domain is defined recursively through at least one of one or more attribute nodes and one or more other type nodes; and
generating a global schema graph based on each local schema graph that has been generated, wherein the global schema graph comprises each of the local schema graphs and at least one edge between at least one of two or more attributes nodes and two or more type nodes from different local schema graphs, and wherein the edge indicates a relationship between the data sources represented by the different local schema graphs comprising the two or more attributes nodes based on a computed similarity between at least one value associated with each of the two or more attributes nodes.
1 Assignment
0 Petitions
Accused Products
Abstract
Various embodiments virtualize data across heterogeneous formats. In one embodiment, a plurality of heterogeneous data sources is received as input. A local schema graph including a set of attribute nodes and a set of type nodes is generated for each of the plurality of heterogeneous data sources. A global schema graph is generated based on each local schema graph that has been generated. The global schema graph comprises each of the local schema graphs and at least one edge between at least one of two or more attributes nodes and two or more type nodes from different local schema graphs. The edge indicates a relationship between the data sources represented by the different local schema graphs comprising the two or more attributes nodes based on a computed similarity between at least one value associated with each of the two or more attributes nodes.
188 Citations
20 Claims
-
1. A method for virtualizing data across heterogeneous formats, the method comprising:
-
receiving, as input, a plurality of heterogeneous data sources; generating, for each of the plurality of heterogeneous data sources, a local schema graph comprising a set of attribute nodes and a set of type nodes, wherein an attribute node corresponds to a schema element in the heterogeneous data source comprising a domain with at least one value and is annotated with the value in the local schema graph, and wherein a type node corresponds to a schema element in the heterogeneous data source whose domain is defined recursively through at least one of one or more attribute nodes and one or more other type nodes; and generating a global schema graph based on each local schema graph that has been generated, wherein the global schema graph comprises each of the local schema graphs and at least one edge between at least one of two or more attributes nodes and two or more type nodes from different local schema graphs, and wherein the edge indicates a relationship between the data sources represented by the different local schema graphs comprising the two or more attributes nodes based on a computed similarity between at least one value associated with each of the two or more attributes nodes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. An information processing system for virtualizing data across heterogeneous formats, the information processing system comprising:
-
a memory; a processor communicatively coupled to the memory; and a data processor communicatively coupled to the memory and the processor, wherein the data processor is configured to perform a method comprising; receiving, as input, a plurality of heterogeneous data sources; generating, for each of the plurality of heterogeneous data sources, a local schema graph comprising a set of attribute nodes and a set of type nodes, wherein an attribute node corresponds to a schema element in the heterogeneous data source comprising a domain with at least one value and is annotated with the value in the local schema graph, and wherein a type node corresponds to a schema element in the heterogeneous data source whose domain is defined recursively through at least one of one or more attribute nodes and one or more other type nodes; and generating a global schema graph based on each local schema graph that has been generated, wherein the global schema graph comprises each of the local schema graphs and at least one edge between at least one of two or more attributes nodes and two or more type nodes from different local schema graphs, and wherein the edge indicates a relationship between the data sources represented by the different local schema graphs comprising the two or more attributes nodes based on a computed similarity between at least one value associated with each of the two or more attributes nodes. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A computer program product for virtualizing data across heterogeneous formats, the computer program product comprising:
a storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method comprising; receiving, as input, a plurality of heterogeneous data sources; generating, for each of the plurality of heterogeneous data sources, a local schema graph comprising a set of attribute nodes and a set of type nodes, wherein an attribute node corresponds to a schema element in the heterogeneous data source comprising a domain with at least one value and is annotated with the value in the local schema graph, and wherein a type node corresponds to a schema element in the heterogeneous data source whose domain is defined recursively through at least one of one or more attribute nodes and one or more other type nodes; and generating a global schema graph based on each local schema graph that has been generated, wherein the global schema graph comprises each of the local schema graphs and at least one edge between at least one of two or more attributes nodes and two or more type nodes from different local schema graphs, and wherein the edge indicates a relationship between the data sources represented by the different local schema graphs comprising the two or more attributes nodes based on a computed similarity between at least one value associated with each of the two or more attributes nodes. - View Dependent Claims (17, 18, 19, 20)
Specification