Unified data model for integration between relational and non-relational databases
First Claim
1. A method comprising:
- parsing a set of records retrieved from a first database to identify a set of one or more data fields in each of the set of records, wherein the first database does not have a defined schema;
determining statistically significant data fields of the set of data fields across the set of records, wherein statistically significant data fields are determined relative to a number of records in the set of records or determined relative to a number of indications for each of the set of data fields;
identifying a first plurality of database entities from the statistically significant data fields;
determining, from a defined schema of a second database, a second plurality of database entities; and
generating a unified data model for the first database and the second database based, at least in part, on the first and the second plurality of database entities.
1 Assignment
0 Petitions
Accused Products
Abstract
Schema-less databases can make data modeling and data management difficult and can detrimentally affect integration with an RDBMS. Inferring a schema from a schema-less database can improve integration by indicating a structure or organization of data in the schema-less database. A schema analyzer can infer a schema by processing data of the schema-less database to identify statistically significant data fields. The schema analyzer then creates a schema that comprises the statistically significant data fields. A data modeler can use the resulting schema along with a schema for a RDBMS to generate a unified data model. A user may submit a query based on the unified data model to obtain results from both databases. The data modeler translates the query from the unified model to be compatible with each of the schemas so that data may be written to or retrieved from each of the schema-less database and the RDBMS.
4 Citations
20 Claims
-
1. A method comprising:
-
parsing a set of records retrieved from a first database to identify a set of one or more data fields in each of the set of records, wherein the first database does not have a defined schema; determining statistically significant data fields of the set of data fields across the set of records, wherein statistically significant data fields are determined relative to a number of records in the set of records or determined relative to a number of indications for each of the set of data fields; identifying a first plurality of database entities from the statistically significant data fields; determining, from a defined schema of a second database, a second plurality of database entities; and generating a unified data model for the first database and the second database based, at least in part, on the first and the second plurality of database entities. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. One or more non-transitory machine-readable media having program code for a data modeler stored therein, the program code comprising instructions to:
-
infer a first schema corresponding to a first database based, at least in part, on identification of statistically significant data fields determined from entities of the first database, wherein the first database does not have a defined schema, wherein statistically significant data fields are determined relative to a number of records in the first database or determined relative to a number of indications for each data field in the first database; retrieve, from a second database, a second schema, wherein the second database has a defined schema; generate a unified data model based, at least in part, on the first schema and the second schema; and migrate data from the second database to the first database using the unified data model. - View Dependent Claims (11)
-
-
12. An apparatus comprising:
-
a processor; and a machine-readable medium having program code executable by the processor to cause the apparatus to, parse a set of records retrieved from a first database to identify a set of one or more data fields in each of the set of records, wherein the first database does not have a defined schema; determine statistically significant data fields of the set of data fields across the set of records, wherein statistically significant data fields are determined relative to a number of records in the set of records or determined relative to a number of indications for each of the set of data fields; identify a first plurality of database entities from the statistically significant data fields; determine, from a defined schema of a second database, a second plurality of database entities; and generate a unified data model for the first database and the second database based, at least in part, on the first and the second plurality of database entities. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
Specification