Method and apparatus for processing electronic data
First Claim
1. A data integration method of integrating data from a first and a second heterogeneous data source, each heterogeneous data source taking the form of an electronic database, the method comprising implementing a first wrapper around the first heterogeneous data source, the first wrapper being configured to convert requests and responses between a common format and one specific to the first data source and implementing a second wrapper around the second heterogeneous data source, the second wrapper being configured to convert requests and responses between the common format and one specific to the second data source;
- wherein each wrapper includes a mapping in the form of a computer readable data file automatically generated according to a method of generating a computer readable data file, on a computer system comprising a digital processor and a memory, the computer readable data file being representative of a mapping between a first representation of a set of concepts or of a data structure associated with the common format and a second representation of a set of concepts or of a data structure associated with a respective one of the first and second data sources, each representation comprising a plurality of complex representational elements which include a number of associated subordinate representational elements, the method of generating a computer readable data file comprising;
the computer system calculating a semantic similarity measure between a subordinate element of the first representation and each of the subordinate elements in the second representation; and
the computer system generating a mapping between the subordinate element of the first representation and one of the subordinate elements of the second representation selected in dependence upon the calculated semantic similarity measures between the subordinate elements;
whereincalculation of a semantic similarity measure by the computer system includes;
the computer system using a linked top ontology data structure stored within the memory of the computer system, the stored data structure comprising a plurality of concept nodes arranged to form a top ontology, the top ontology being a partial subset of a full ontology having at least twice as many nodes as the top ontology, the nodes in the top ontology being selected from the full ontology based on their ancestral closeness to a root node and/or their ancestral remoteness from a leaf node of the full ontology, the linked top ontology further comprising a plurality of pre-processed vocabulary terms each of which is linked to one or more of the nodes in the top ontology, the linked top ontology data structure being used by the computer system as follows;
the names of the subordinate elements between whom a semantic similarity is being calculated being compared by the computer system with the vocabulary terms and for any vocabulary terms which match the names of the subordinate elements, the computer system identifying the top ontology nodes associated with the matched vocabulary terms and comparing the identified top ontology nodes associated with each name of the subordinate elements, and the computer system determining a semantic similarity based on the degree of commonality between the top ontology nodes associated with each of the subordinate elements.
1 Assignment
0 Petitions
Accused Products
Abstract
A system (100) for generating a computer readable data file representative of a mapping between a first representation of a set of concepts or of a data structure (e.g. a database schema) and a second representation of a set of concepts or of a data structure (e.g. an ontology), each representation comprising a plurality of complex representational elements (e.g. tables in a database schema and concepts in an ontology) each of which may itself include a number of associated subordinate representational elements (e.g. columns/fields of a table in a database schema and attributes of a concept in an ontology). The system (100) includes a semantic similarity calculation module (134) for calculating a semantic similarity measure between a subordinate element of the first representation and each of the subordinate elements in the second representation and a mapping generation module (137) for generating a mapping between the subordinate element of the first representation and one of the subordinate elements of the second representation selected in dependence upon the calculated semantic similarity measures between the subordinate elements.
7 Citations
6 Claims
-
1. A data integration method of integrating data from a first and a second heterogeneous data source, each heterogeneous data source taking the form of an electronic database, the method comprising implementing a first wrapper around the first heterogeneous data source, the first wrapper being configured to convert requests and responses between a common format and one specific to the first data source and implementing a second wrapper around the second heterogeneous data source, the second wrapper being configured to convert requests and responses between the common format and one specific to the second data source;
- wherein each wrapper includes a mapping in the form of a computer readable data file automatically generated according to a method of generating a computer readable data file, on a computer system comprising a digital processor and a memory, the computer readable data file being representative of a mapping between a first representation of a set of concepts or of a data structure associated with the common format and a second representation of a set of concepts or of a data structure associated with a respective one of the first and second data sources, each representation comprising a plurality of complex representational elements which include a number of associated subordinate representational elements, the method of generating a computer readable data file comprising;
the computer system calculating a semantic similarity measure between a subordinate element of the first representation and each of the subordinate elements in the second representation; and the computer system generating a mapping between the subordinate element of the first representation and one of the subordinate elements of the second representation selected in dependence upon the calculated semantic similarity measures between the subordinate elements;
whereincalculation of a semantic similarity measure by the computer system includes; the computer system using a linked top ontology data structure stored within the memory of the computer system, the stored data structure comprising a plurality of concept nodes arranged to form a top ontology, the top ontology being a partial subset of a full ontology having at least twice as many nodes as the top ontology, the nodes in the top ontology being selected from the full ontology based on their ancestral closeness to a root node and/or their ancestral remoteness from a leaf node of the full ontology, the linked top ontology further comprising a plurality of pre-processed vocabulary terms each of which is linked to one or more of the nodes in the top ontology, the linked top ontology data structure being used by the computer system as follows; the names of the subordinate elements between whom a semantic similarity is being calculated being compared by the computer system with the vocabulary terms and for any vocabulary terms which match the names of the subordinate elements, the computer system identifying the top ontology nodes associated with the matched vocabulary terms and comparing the identified top ontology nodes associated with each name of the subordinate elements, and the computer system determining a semantic similarity based on the degree of commonality between the top ontology nodes associated with each of the subordinate elements. - View Dependent Claims (2, 3, 4)
- wherein each wrapper includes a mapping in the form of a computer readable data file automatically generated according to a method of generating a computer readable data file, on a computer system comprising a digital processor and a memory, the computer readable data file being representative of a mapping between a first representation of a set of concepts or of a data structure associated with the common format and a second representation of a set of concepts or of a data structure associated with a respective one of the first and second data sources, each representation comprising a plurality of complex representational elements which include a number of associated subordinate representational elements, the method of generating a computer readable data file comprising;
-
5. A data integration system including a mapping generating computer system and further including first and second heterogeneous data sources, each data source taking the form of an electronic database, and a first wrapper for wrapping around the first data source and a second wrapper for wrapping around the second data source, wherein the first wrapper is configured to convert requests and responses between a common format and one specific to the first data source and the second wrapper is configured to convert requests and responses between the common format and one specific to the second data source;
- and wherein each wrapper includes a mapping in the form of a computer readable data file automatically generated by the mapping generating system, and wherein the mapping generating computer system is configured to generate a computer readable data file representative of a mapping between a first representation of a set of concepts or of a data structure associated with the common format and a second representation of a set of concepts or of a data structure associated with a respective one of the first and second data sources, each representation comprising a plurality of complex representational elements which include a number of associated subordinate representational elements, the computer system including;
a semantic similarity calculation module which is executable by the computer system for calculating a semantic similarity measure between a subordinate element of the first representation and each of the subordinate elements in the second representation; and a mapping generation module which is executable by the computer system for generating a mapping between the subordinate element of the first representation and one of the subordinate elements of the second representation selected in dependence upon the calculated semantic similarity measures between the subordinate elements;
whereinthe system further includes a linked top ontology module which is executable by the computer system for storing a linked top ontology data structure which comprises a plurality of concept nodes arranged into a top ontology, the top ontology being a partial subset of a full ontology having at least twice as many nodes as the top ontology, the nodes in the top ontology being selected from the full ontology based on their ancestral closeness to a root node and/or their ancestral remoteness from a leaf node of the full ontology, the linked top ontology data structure further comprising a plurality of pre-processed vocabulary terms each of which is linked to one or more of the nodes in the top ontology; and
whereinthe semantic similarity calculation module is configured to compare the names of the subordinate elements between whom a semantic similarity is being calculated with the vocabulary terms and, for any vocabulary terms which match the names of the subordinate elements, to identify the top ontology nodes associated with the matched vocabulary terms and to compare the identified top ontology nodes associated with each name of the subordinate elements, and to determine a semantic similarity based on the degree of commonality between the top ontology nodes associated with each of the subordinate elements.
- and wherein each wrapper includes a mapping in the form of a computer readable data file automatically generated by the mapping generating system, and wherein the mapping generating computer system is configured to generate a computer readable data file representative of a mapping between a first representation of a set of concepts or of a data structure associated with the common format and a second representation of a set of concepts or of a data structure associated with a respective one of the first and second data sources, each representation comprising a plurality of complex representational elements which include a number of associated subordinate representational elements, the computer system including;
-
6. Non-transitory computer readable storage medium carrying instructions which upon execution by a processor provide a method of integrating data from a first and a second heterogeneous data source, each heterogeneous data source taking the form of an electronic database, the method comprising implementing a first wrapper around the first heterogeneous data source, the first wrapper being configured to convert requests and responses between a common format and one specific to the first data source and implementing a second wrapper around the second heterogeneous data source to convert requests and responses between the common format and one specific to the second data source;
- wherein each wrapper includes a mapping in the form of a computer readable data file automatically generated according to a method of generating a computer readable data file, on a computer system comprising a digital processor and a memory, the computer readable data file being representative of a mapping between a first representation of a set of concepts or of a data structure associated with the common format and a second representation of a set of concepts or of a data structure associated with a respective one of the first and second data sources, each representation comprising a plurality of complex representational elements which include a number of associated subordinate representational elements, the method of generating a computer readable data file comprising;
the computer system calculating a semantic similarity measure between a subordinate element of the first representation and each of the subordinate elements in the second representation; and the computer system generating a mapping between the subordinate element of the first representation and one of the subordinate elements of the second representation selected in dependence upon the calculated semantic similarity measures between the subordinate elements;
wherein calculation of a semantic similarity measure by the computer system includes;the computer system using a linked top ontology data structure stored within the memory of the computer system, the stored data structure comprising a plurality of concept nodes arranged to form a top ontology, the top ontology being a partial subset of a full ontology having at least twice as many nodes as the top ontology, the nodes in the top ontology being selected from the full ontology based on their ancestral closeness to a root node and/or their ancestral remoteness from a leaf node of the full ontology, the linked top ontology further comprising a plurality of pre-processed vocabulary terms each of which is linked to one or more of the nodes in the top ontology, the linked top ontology data structure being used by the computer system as follows; the names of the subordinate elements between whom a semantic similarity is being calculated being compared by the computer system with the vocabulary terms and for any vocabulary terms which match the names of the subordinate elements, the computer system identifying the top ontology nodes associated with the matched vocabulary terms and comparing the identified top ontology nodes associated with each name of the subordinate elements, and the computer system determining a semantic similarity based on the degree of commonality between the top ontology nodes associated with each of the subordinate elements.
- wherein each wrapper includes a mapping in the form of a computer readable data file automatically generated according to a method of generating a computer readable data file, on a computer system comprising a digital processor and a memory, the computer readable data file being representative of a mapping between a first representation of a set of concepts or of a data structure associated with the common format and a second representation of a set of concepts or of a data structure associated with a respective one of the first and second data sources, each representation comprising a plurality of complex representational elements which include a number of associated subordinate representational elements, the method of generating a computer readable data file comprising;
Specification