System for linking diverse data systems
First Claim
Patent Images
1. A method comprising:
- receiving a dataset at a communication interface, the dataset including dataset context information comprising metadata for the dataset;
reviewing, by a processor, the metadata for the dataset to determine a type of data of the dataset;
selecting, by the processor, a core model comprising a schema of structured relationships correlating to the type of data of the dataset, the core model comprising;
a first type node corresponding to a first datatype;
a first database node corresponding to a first database;
a first relationship between the first type node and the first database node establishing that data of the first datatype is stored in the first database;
a second type node corresponding to a second datatype;
a second database node corresponding to a second database; and
a second relationship between the second type node and the second database node establishing that data of the second datatype is stored in the second database;
the first database and the second database identified as part of a data storage architecture extending across multiple databases including the first and second databases;
determining, by the processor, that a first portion of the dataset has the first datatype;
reviewing, by the processor, the first relationship between the first type node and the first database node to determine to store the first portion of the dataset in the first database;
transmitting, by the communication interface, the first portion of the dataset to the first database for storage;
determining, by the processor, that a second portion of the dataset has the second datatype;
reviewing, by the processor, the second relationship between the second type node and the second database node to determine to store the second portion of the dataset in the second database;
transmitting, by the communication interface, the second portion of the dataset to the second database for storage;
creating a linked representation separate from the core model that links the first portion of the dataset to the second portion of the dataset, the linked representation comprising;
a representation of the first portion of the dataset as an instance of the first type node of the core model;
a representation of the first database as an instance of the first database node of the core model;
a representation of the second portion of the dataset as an instance of the second type node of the core model; and
a representation of the second database as an instance of the second database node of the core model;
creating, by the processor, a domain knowledge graph from the linked representation, the domain knowledge graph including multiple linked nodes corresponding to core model instances; and
receiving, by content aware routing circuitry, data from a new data source; and
performing, by the content aware routing circuitry, an onboarding procedure on the data from the new data source, the onboarding procedure including;
identifying a type of data for the data from the new data source;
determining a correct database into which to store the data from the new data source responsive to the type of data for the data from the new data source; and
instantiating a new node in the domain knowledge graph corresponding to the new data source.
1 Assignment
0 Petitions
Accused Products
Abstract
A system creates an abstraction layer surrounding a diverse data system including multiple different databases. Data is received from data sources and ingested into the various databases according to a core model. New instances of the core model are created and added to a larger linked data model (LDM) when new data sources are added to the system. The LDM captures the linkages between different linked data objects and links across different databases. Accordingly, applications are able to access or explore the linked data stored in different databases without prior knowledge of the linking relationships.
12 Citations
12 Claims
-
1. A method comprising:
-
receiving a dataset at a communication interface, the dataset including dataset context information comprising metadata for the dataset; reviewing, by a processor, the metadata for the dataset to determine a type of data of the dataset; selecting, by the processor, a core model comprising a schema of structured relationships correlating to the type of data of the dataset, the core model comprising; a first type node corresponding to a first datatype; a first database node corresponding to a first database; a first relationship between the first type node and the first database node establishing that data of the first datatype is stored in the first database; a second type node corresponding to a second datatype; a second database node corresponding to a second database; and a second relationship between the second type node and the second database node establishing that data of the second datatype is stored in the second database; the first database and the second database identified as part of a data storage architecture extending across multiple databases including the first and second databases; determining, by the processor, that a first portion of the dataset has the first datatype; reviewing, by the processor, the first relationship between the first type node and the first database node to determine to store the first portion of the dataset in the first database; transmitting, by the communication interface, the first portion of the dataset to the first database for storage; determining, by the processor, that a second portion of the dataset has the second datatype; reviewing, by the processor, the second relationship between the second type node and the second database node to determine to store the second portion of the dataset in the second database; transmitting, by the communication interface, the second portion of the dataset to the second database for storage; creating a linked representation separate from the core model that links the first portion of the dataset to the second portion of the dataset, the linked representation comprising; a representation of the first portion of the dataset as an instance of the first type node of the core model; a representation of the first database as an instance of the first database node of the core model; a representation of the second portion of the dataset as an instance of the second type node of the core model; and a representation of the second database as an instance of the second database node of the core model; creating, by the processor, a domain knowledge graph from the linked representation, the domain knowledge graph including multiple linked nodes corresponding to core model instances; and receiving, by content aware routing circuitry, data from a new data source; and performing, by the content aware routing circuitry, an onboarding procedure on the data from the new data source, the onboarding procedure including; identifying a type of data for the data from the new data source; determining a correct database into which to store the data from the new data source responsive to the type of data for the data from the new data source; and instantiating a new node in the domain knowledge graph corresponding to the new data source. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A system comprising:
-
a communication interface configured to receive a dataset including dataset context information comprising metadata for the dataset; a processor coupled to the communication interface, the processor configured to; review the metadata to determine a type of data of the dataset; determine a core model comprising a schema of structured relationships correlating to the type of data of the dataset, the core model comprising; a first type node corresponding to a first datatype; a first database node corresponding to a first database; a first relationship between the first type node and the first database node establishing that data of the first datatype is stored in the first database; a second type node corresponding to a second datatype; a second database node corresponding to a second database; and a second relationship between the second type node and the second database node establishing that data of the second datatype is stored in the second database; the first database and the second database identified as part of a data storage architecture extending across multiple databases including the first and second databases; determine that a first portion of the dataset has the first datatype; review the first relationship between the first type node and the first database node in the core model to determine to store the first portion of the dataset in the first database; determine that a second portion of the dataset has the second datatype; review the second relationship between the second type node and the second database node in the core model to determine to store the second portion of the dataset in the second database; create a linked representation separate from the core model that links the first portion of the dataset to the second portion of the dataset, the linked representation comprising; a representation of the first portion of the dataset as an instance of the first type node of the core model; a representation of the first database as an instance of the first database node of the core model; a representation of the second portion of the dataset as an instance of the second type node of the core model; and a representation of the second database as an instance of the second database node of the core model; and create a domain knowledge graph from the linked representation, the domain knowledge graph including multiple linked nodes corresponding to core model instances; and the communication interface further configured to; transmit the first portion of the dataset to the first database for storage; and transmit the second portion of the dataset to the second database for storage; and content aware routing circuitry configured to; receive data from a new data source; and perform an onboarding procedure on the data from the new data source, the onboarding procedure including; identifying a type of data for the data from the new data source; determining a correct database into which to store the data from the new data source responsive to the type of data for the data from the new data source; and instantiating a new node in the domain knowledge graph corresponding to the new data source. - View Dependent Claims (8, 9, 10, 11, 12)
-
Specification