Automatic data warehouse generation using automatically generated schema
First Claim
Patent Images
1. A system, comprising:
- one or more processors configured to;
receive a source data set, the source data set including a plurality of tables and a plurality of keys;
derive metadata from the received source data set, wherein deriving the metadata from the source data set includes;
determining, for a table in the source data set, a grain of the table;
determining, for a column in the source data set, a target dimension and level;
determining whether the column is a measure; and
determining one or more dimensional hierarchies, wherein one or more levels of the one or more dimensional hierarchies are designated to be transformed into dimension tables;
subsequent to deriving the metadata from the source data set, using the metadata derived from the source data set to automatically generate a corresponding database schema, wherein generating the corresponding database schema includes;
generating a dimension table, wherein the dimension table is generated based at least in part on one or more columns in the source data set determined to be targeted to a level associated with the dimension table; and
generating a measure table, wherein the measure table is generated for the determined grain;
load a data warehouse associated with the automatically generated database schema, wherein loading the data warehouse includes;
generating one or more inserts into the generated dimension table;
moving data from one or more tables in the source data set into the generated dimension table at least in part by using the generated one or more inserts; and
identifying one or more sources for one or more measure grains at least in part by scanning tables in the source data set; and
a memory coupled to the one or more processors and configured to provide the one or more processors with instructions.
3 Assignments
0 Petitions
Accused Products
Abstract
Techniques for automatic data warehouse generation are disclosed herein. A first data set is received. The first data set includes a plurality of tables and includes a plurality of keys. A dimensional model is mapped onto the first data set. A schema is automatically generated based on the dimensional model.
90 Citations
13 Claims
-
1. A system, comprising:
-
one or more processors configured to; receive a source data set, the source data set including a plurality of tables and a plurality of keys; derive metadata from the received source data set, wherein deriving the metadata from the source data set includes; determining, for a table in the source data set, a grain of the table; determining, for a column in the source data set, a target dimension and level; determining whether the column is a measure; and determining one or more dimensional hierarchies, wherein one or more levels of the one or more dimensional hierarchies are designated to be transformed into dimension tables; subsequent to deriving the metadata from the source data set, using the metadata derived from the source data set to automatically generate a corresponding database schema, wherein generating the corresponding database schema includes; generating a dimension table, wherein the dimension table is generated based at least in part on one or more columns in the source data set determined to be targeted to a level associated with the dimension table; and generating a measure table, wherein the measure table is generated for the determined grain; load a data warehouse associated with the automatically generated database schema, wherein loading the data warehouse includes; generating one or more inserts into the generated dimension table; moving data from one or more tables in the source data set into the generated dimension table at least in part by using the generated one or more inserts; and identifying one or more sources for one or more measure grains at least in part by scanning tables in the source data set; and a memory coupled to the one or more processors and configured to provide the one or more processors with instructions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method, comprising:
-
receiving a source data set, the source data set including a plurality of tables and a plurality of keys; deriving metadata from the received source data set, wherein deriving the metadata from the source data set includes; determining, for a table in the source data set, a grain of the table; determining, for a column in the source data set, a target dimension and level; determining whether the column is a measure; and determining one or more dimensional hierarchies, wherein one or more levels of the one or more dimensional hierarchies are designated to be transformed into dimension tables; subsequent to deriving the metadata from the source data set, using the metadata derived from the source data set to automatically generate, using one or more processors, a corresponding database schema, wherein generating the corresponding database schema includes; generating a dimension table, wherein the dimension table is generated based at least in part on one or more columns in the source data set determined to be targeted to a level associated with the dimension table; and generating a measure table, wherein the measure table is generated for the determined grain; loading a data warehouse associated with the automatically generated database schema, wherein loading the data warehouse includes; generating one or more inserts into the generated dimension table; moving data from one or more tables in the source data set into the generated dimension table at least in part by using the generated one or more inserts; and identifying one or more sources for one or more measure grains at least in part by scanning tables in the source data set.
-
-
13. A computer program product embodied in a non-transitory computer readable storage medium and comprising computer instructions for:
-
receiving a source data set, the source data set including a plurality of tables and a plurality of keys; deriving metadata from the received source data set, wherein deriving the metadata from the source data set includes; determining, for a table in the source data set, a grain of the table; determining, for a column in the source data set, a target dimension and level; determining whether the column is a measure; and determining one or more dimensional hierarchies, wherein one or more levels of the one or more dimensional hierarchies are designated to be transformed into dimension tables; subsequent to deriving the metadata from the source data set, using the metadata derived from the source data set to automatically generate, using one or more processors, a corresponding database schema, wherein generating the corresponding database schema includes; generating a dimension table, wherein the dimension table is generated based at least in part on one or more columns in the source data set determined to be targeted to a level associated with the dimension table; and generating a measure table, wherein the measure table is generated for the determined grain; loading a data warehouse associated with the automatically generated database schema, wherein loading the data warehouse includes; generating one or more inserts into the generated dimension table; moving data from one or more tables in the source data set into the generated dimension table at least in part by using the generated one or more inserts; and identifying one or more sources for one or more measure grains at least in part by scanning tables in the source data set.
-
Specification