Method and apparatus for processing electronic data

US 9,773,053 B2
Filed: 12/23/2011
Issued: 09/26/2017
Est. Priority Date: 12/23/2010
Status: Active Grant

First Claim

Patent Images

1. A data integration method of integrating data from a first and a second heterogeneous data source, each heterogeneous data source taking the form of an electronic database, the method comprising implementing a first wrapper around the first heterogeneous data source, the first wrapper being configured to convert requests and responses between a common format and one specific to the first data source and implementing a second wrapper around the second heterogeneous data source, the second wrapper being configured to convert requests and responses between the common format and one specific to the second data source;

wherein each wrapper includes a mapping in the form of a computer readable data file automatically generated according to a method of generating a computer readable data file, on a computer system comprising a digital processor and a memory, the computer readable data file being representative of a mapping between a first representation of a set of concepts or of a data structure associated with the common format and a second representation of a set of concepts or of a data structure associated with a respective one of the first and second data sources, each representation comprising a plurality of complex representational elements which include a number of associated subordinate representational elements, the method of generating a computer readable data file comprising;

the computer system calculating a semantic similarity measure between a subordinate element of the first representation and each of the subordinate elements in the second representation; and

the computer system generating a mapping between the subordinate element of the first representation and one of the subordinate elements of the second representation selected in dependence upon the calculated semantic similarity measures between the subordinate elements;

whereincalculation of a semantic similarity measure by the computer system includes;

the computer system using a linked top ontology data structure stored within the memory of the computer system, the stored data structure comprising a plurality of concept nodes arranged to form a top ontology, the top ontology being a partial subset of a full ontology having at least twice as many nodes as the top ontology, the nodes in the top ontology being selected from the full ontology based on their ancestral closeness to a root node and/or their ancestral remoteness from a leaf node of the full ontology, the linked top ontology further comprising a plurality of pre-processed vocabulary terms each of which is linked to one or more of the nodes in the top ontology, the linked top ontology data structure being used by the computer system as follows;

the names of the subordinate elements between whom a semantic similarity is being calculated being compared by the computer system with the vocabulary terms and for any vocabulary terms which match the names of the subordinate elements, the computer system identifying the top ontology nodes associated with the matched vocabulary terms and comparing the identified top ontology nodes associated with each name of the subordinate elements, and the computer system determining a semantic similarity based on the degree of commonality between the top ontology nodes associated with each of the subordinate elements.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system (100) for generating a computer readable data file representative of a mapping between a first representation of a set of concepts or of a data structure (e.g. a database schema) and a second representation of a set of concepts or of a data structure (e.g. an ontology), each representation comprising a plurality of complex representational elements (e.g. tables in a database schema and concepts in an ontology) each of which may itself include a number of associated subordinate representational elements (e.g. columns/fields of a table in a database schema and attributes of a concept in an ontology). The system (100) includes a semantic similarity calculation module (134) for calculating a semantic similarity measure between a subordinate element of the first representation and each of the subordinate elements in the second representation and a mapping generation module (137) for generating a mapping between the subordinate element of the first representation and one of the subordinate elements of the second representation selected in dependence upon the calculated semantic similarity measures between the subordinate elements.

7 Citations

View as Search Results

6 Claims

1. A data integration method of integrating data from a first and a second heterogeneous data source, each heterogeneous data source taking the form of an electronic database, the method comprising implementing a first wrapper around the first heterogeneous data source, the first wrapper being configured to convert requests and responses between a common format and one specific to the first data source and implementing a second wrapper around the second heterogeneous data source, the second wrapper being configured to convert requests and responses between the common format and one specific to the second data source;
- wherein each wrapper includes a mapping in the form of a computer readable data file automatically generated according to a method of generating a computer readable data file, on a computer system comprising a digital processor and a memory, the computer readable data file being representative of a mapping between a first representation of a set of concepts or of a data structure associated with the common format and a second representation of a set of concepts or of a data structure associated with a respective one of the first and second data sources, each representation comprising a plurality of complex representational elements which include a number of associated subordinate representational elements, the method of generating a computer readable data file comprising;
  
  the computer system calculating a semantic similarity measure between a subordinate element of the first representation and each of the subordinate elements in the second representation; and
  
  the computer system generating a mapping between the subordinate element of the first representation and one of the subordinate elements of the second representation selected in dependence upon the calculated semantic similarity measures between the subordinate elements;
  
  whereincalculation of a semantic similarity measure by the computer system includes;
  
  the computer system using a linked top ontology data structure stored within the memory of the computer system, the stored data structure comprising a plurality of concept nodes arranged to form a top ontology, the top ontology being a partial subset of a full ontology having at least twice as many nodes as the top ontology, the nodes in the top ontology being selected from the full ontology based on their ancestral closeness to a root node and/or their ancestral remoteness from a leaf node of the full ontology, the linked top ontology further comprising a plurality of pre-processed vocabulary terms each of which is linked to one or more of the nodes in the top ontology, the linked top ontology data structure being used by the computer system as follows;
  
  the names of the subordinate elements between whom a semantic similarity is being calculated being compared by the computer system with the vocabulary terms and for any vocabulary terms which match the names of the subordinate elements, the computer system identifying the top ontology nodes associated with the matched vocabulary terms and comparing the identified top ontology nodes associated with each name of the subordinate elements, and the computer system determining a semantic similarity based on the degree of commonality between the top ontology nodes associated with each of the subordinate elements.
- View Dependent Claims (2, 3, 4)
- - 2. The method according to claim 1 wherein the calculation of a semantic similarity further includes the computer system comparing the names of the complex representational elements with vocabulary terms and identifying the top ontology nodes associated with any matched names and determining the degree of commonality between on the one hand the identified top ontology nodes associated with either one of the subordinate elements and its associated complex representational element and, on the other hand, the other subordinate element and its associated complex representational element.
  - 3. The method according to claim 1 further comprising the computer system performing steps of matching names to vocabulary terms, identifying the top ontology nodes associated with any matched vocabulary terms and determining a degree of commonality between the so identified top ontology nodes in respect of the names of the complex representational elements associated with or which include the respective subordinate elements between which the semantic similarity is being calculated and the converse subordinate elements, and using the degree of commonality determined between these complex elements and their converse subordinate elements as a factor in the determination of overall semantic distance.
  - 4. The data integration method according to claim 1 further comprising:
    - receiving a complex query from a human user or from a computer application, the complex query being expressed in the common format, processing the complex query to form a first sub query for sending to the first heterogeneous data source and a second sub-query for sending to the second heterogeneous data source, sending the first sub-query to the first data source via the first wrapper, the first wrapper converting the first sub-query from the common format to the format specific to the first data source, and sending the second sub-query to the second data source via the second wrapper, the second wrapper converting the second sub-query from the common format to the format specific to the second data source, receiving a first reply to the first sub-query from the first data source in the format specific to the first data source via the first wrapper which converts the first reply from the format specific to the first data source into the common format, receiving a second reply to the second sub-query from the second data source in the format specific to the second data source via the second wrapper which converts the second reply from the format specific to the second data source into the common format, combining the first and second responses together to form a complex response expressed in the common format, and returning the complex response to the requesting human user or computer application.

5. A data integration system including a mapping generating computer system and further including first and second heterogeneous data sources, each data source taking the form of an electronic database, and a first wrapper for wrapping around the first data source and a second wrapper for wrapping around the second data source, wherein the first wrapper is configured to convert requests and responses between a common format and one specific to the first data source and the second wrapper is configured to convert requests and responses between the common format and one specific to the second data source;
- and wherein each wrapper includes a mapping in the form of a computer readable data file automatically generated by the mapping generating system, and wherein the mapping generating computer system is configured to generate a computer readable data file representative of a mapping between a first representation of a set of concepts or of a data structure associated with the common format and a second representation of a set of concepts or of a data structure associated with a respective one of the first and second data sources, each representation comprising a plurality of complex representational elements which include a number of associated subordinate representational elements, the computer system including;
  
  a semantic similarity calculation module which is executable by the computer system for calculating a semantic similarity measure between a subordinate element of the first representation and each of the subordinate elements in the second representation; and
  
  a mapping generation module which is executable by the computer system for generating a mapping between the subordinate element of the first representation and one of the subordinate elements of the second representation selected in dependence upon the calculated semantic similarity measures between the subordinate elements;
  
  whereinthe system further includes a linked top ontology module which is executable by the computer system for storing a linked top ontology data structure which comprises a plurality of concept nodes arranged into a top ontology, the top ontology being a partial subset of a full ontology having at least twice as many nodes as the top ontology, the nodes in the top ontology being selected from the full ontology based on their ancestral closeness to a root node and/or their ancestral remoteness from a leaf node of the full ontology, the linked top ontology data structure further comprising a plurality of pre-processed vocabulary terms each of which is linked to one or more of the nodes in the top ontology; and
  
  whereinthe semantic similarity calculation module is configured to compare the names of the subordinate elements between whom a semantic similarity is being calculated with the vocabulary terms and, for any vocabulary terms which match the names of the subordinate elements, to identify the top ontology nodes associated with the matched vocabulary terms and to compare the identified top ontology nodes associated with each name of the subordinate elements, and to determine a semantic similarity based on the degree of commonality between the top ontology nodes associated with each of the subordinate elements.

6. Non-transitory computer readable storage medium carrying instructions which upon execution by a processor provide a method of integrating data from a first and a second heterogeneous data source, each heterogeneous data source taking the form of an electronic database, the method comprising implementing a first wrapper around the first heterogeneous data source, the first wrapper being configured to convert requests and responses between a common format and one specific to the first data source and implementing a second wrapper around the second heterogeneous data source to convert requests and responses between the common format and one specific to the second data source;
- wherein each wrapper includes a mapping in the form of a computer readable data file automatically generated according to a method of generating a computer readable data file, on a computer system comprising a digital processor and a memory, the computer readable data file being representative of a mapping between a first representation of a set of concepts or of a data structure associated with the common format and a second representation of a set of concepts or of a data structure associated with a respective one of the first and second data sources, each representation comprising a plurality of complex representational elements which include a number of associated subordinate representational elements, the method of generating a computer readable data file comprising;
  
  the computer system calculating a semantic similarity measure between a subordinate element of the first representation and each of the subordinate elements in the second representation; and
  
  the computer system generating a mapping between the subordinate element of the first representation and one of the subordinate elements of the second representation selected in dependence upon the calculated semantic similarity measures between the subordinate elements;
  
  wherein calculation of a semantic similarity measure by the computer system includes;
  
  the computer system using a linked top ontology data structure stored within the memory of the computer system, the stored data structure comprising a plurality of concept nodes arranged to form a top ontology, the top ontology being a partial subset of a full ontology having at least twice as many nodes as the top ontology, the nodes in the top ontology being selected from the full ontology based on their ancestral closeness to a root node and/or their ancestral remoteness from a leaf node of the full ontology, the linked top ontology further comprising a plurality of pre-processed vocabulary terms each of which is linked to one or more of the nodes in the top ontology, the linked top ontology data structure being used by the computer system as follows;
  
  the names of the subordinate elements between whom a semantic similarity is being calculated being compared by the computer system with the vocabulary terms and for any vocabulary terms which match the names of the subordinate elements, the computer system identifying the top ontology nodes associated with the matched vocabulary terms and comparing the identified top ontology nodes associated with each name of the subordinate elements, and the computer system determining a semantic similarity based on the degree of commonality between the top ontology nodes associated with each of the subordinate elements.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
British Telecommunications PLC (BT Group PLC)
Original Assignee
British Telecommunications PLC (BT Group PLC)
Inventors
Lee, Beum Seuk, Cui, Zhan
Primary Examiner(s)
Lu, Charles

Application Number

US13/996,840
Publication Number

US 20130290338A1
Time in Patent Office

2,104 Days
Field of Search
US Class Current
CPC Class Codes

G06F 16/212   with details for data model...

G06F 16/256   in federated or virtual dat...

G06F 16/258   Data format conversion from...

G06F 16/285   Clustering or classification

G06F 16/36   Creation of semantic tools,...

G06F 16/84   Mapping; Conversion

Method and apparatus for processing electronic data

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

7 Citations

6 Claims

Specification

Solutions

Use Cases

Quick Links

Method and apparatus for processing electronic data

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

7 Citations

6 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links