Method and system for providing business intelligence data
First Claim
1. A system for improved efficiency in the retrieval of business intelligence data used in data mining, the system comprising:
- an analytics server including a computer readable medium having a data file stored thereon, said data file consisting of source data aggregated from one or more data sources;
wherein said analytics server includes computer readable instructions stored on said computer readable medium for;
normalizing the source data to produce normalized data;
generating one or more dimensions from said source data, wherein said one or more dimensions define categories into which portions of said normalized data can be grouped in a snowflake schema;
generating one or more measures for each component used in producing an end product from said source data linked to said one or more dimensions in said snowflake schema;
said measures comprising rate measures, allocation measures, and hierarchical structure measures;
storing said one or more dimensions and said one or more measures in a plurality of tables arranged in a star schema;
determining relationship information between said one or more measures and said one or more dimensions in each of said plurality of tables;
filtering said plurality of tables to generate a plurality of independent fact tables and adding said independent fact tables to a fact table pool;
each of said plurality of independent fact tables selected from a category fact table, a time aggregated fact table, and a generalized fact table;
generating, from the normalized data, a master facts table containing data for two or more categories;
generating into the pool, a plurality of baby fact tables, each comprising a subset of the master facts table;
generating a plurality of cubes from the baby fact tables, the cubes aggregating data in the baby fact tables by at least one of the categories;
aggregating data in the baby fact tables by at least one of the categories;
receiving a query;
upon receiving the query, searching for the most specific baby fact table available in the pool to satisfy the query;
failing to find the most specific baby fact table;
upon said failing to find the most specific baby fact table, recording a miss, creating a new cube and recording the cube creation time for the new cube; and
based on said recording of the miss, pre-generating the most specific baby fact table for use in subsequent queries, wherein, the cube creation time using baby fact tables is smaller than the cube creation time using the master fact tables, thereby speeding up the generation of the cubes;
creating additional independent fact tables based on said pool statistics and adding said additional independent fact tables to said fact table pool;
determining a plurality of relationships between each component and said end product and storing said relationships in each of said plurality of tables;
each relationship comprises a cost relationship as a percentage of a total cost required to produce said end product;
storing said relationship information on said computer readable medium;
calculating a total cost of at least one product based on said cost relationship information;
one or more computing devices in communication with said analytics server, and including a module stored on a further computer readable medium having instructions thereon for;
submitting said at least one query and receiving data from said most specific fact table from said analytics server;
wherein said at least one query comprises querying for the change in total cost of said at least one product based on a change in any one of said measures;
and wherein a clustered index of each of said independent fact tables is cached in memory for faster query processing.
1 Assignment
0 Petitions
Accused Products
Abstract
A computer implemented method for data mining and providing business intelligence data including generating by an analytics server one or more dimensions from source data imported from a computer readable medium, wherein the one or more dimensions define categories into which portions of the normalized data can be grouped; generating by the analytics server one or more measures from the source data linked to the one or more dimensions; storing by the analytics server the one or more dimensions and the one or more measures in a plurality of tables arranged in one of a snowflake and a star schema; determining by the analytics server relationship information between one or more measures and one or more dimensions in each of the plurality of tables; storing by the analytics server the relationship information on the computer readable medium; calculating by the analytics server a total cost of at least one product based on the relationship information; and, querying by a computer system in communication with the analytics server for the change in total cost of the at least one product based on a change in any one of the measures.
-
Citations
10 Claims
-
1. A system for improved efficiency in the retrieval of business intelligence data used in data mining, the system comprising:
-
an analytics server including a computer readable medium having a data file stored thereon, said data file consisting of source data aggregated from one or more data sources; wherein said analytics server includes computer readable instructions stored on said computer readable medium for; normalizing the source data to produce normalized data; generating one or more dimensions from said source data, wherein said one or more dimensions define categories into which portions of said normalized data can be grouped in a snowflake schema; generating one or more measures for each component used in producing an end product from said source data linked to said one or more dimensions in said snowflake schema;
said measures comprising rate measures, allocation measures, and hierarchical structure measures;storing said one or more dimensions and said one or more measures in a plurality of tables arranged in a star schema; determining relationship information between said one or more measures and said one or more dimensions in each of said plurality of tables; filtering said plurality of tables to generate a plurality of independent fact tables and adding said independent fact tables to a fact table pool;
each of said plurality of independent fact tables selected from a category fact table, a time aggregated fact table, and a generalized fact table;generating, from the normalized data, a master facts table containing data for two or more categories; generating into the pool, a plurality of baby fact tables, each comprising a subset of the master facts table; generating a plurality of cubes from the baby fact tables, the cubes aggregating data in the baby fact tables by at least one of the categories; aggregating data in the baby fact tables by at least one of the categories; receiving a query; upon receiving the query, searching for the most specific baby fact table available in the pool to satisfy the query; failing to find the most specific baby fact table; upon said failing to find the most specific baby fact table, recording a miss, creating a new cube and recording the cube creation time for the new cube; and based on said recording of the miss, pre-generating the most specific baby fact table for use in subsequent queries, wherein, the cube creation time using baby fact tables is smaller than the cube creation time using the master fact tables, thereby speeding up the generation of the cubes; creating additional independent fact tables based on said pool statistics and adding said additional independent fact tables to said fact table pool; determining a plurality of relationships between each component and said end product and storing said relationships in each of said plurality of tables; each relationship comprises a cost relationship as a percentage of a total cost required to produce said end product; storing said relationship information on said computer readable medium; calculating a total cost of at least one product based on said cost relationship information; one or more computing devices in communication with said analytics server, and including a module stored on a further computer readable medium having instructions thereon for; submitting said at least one query and receiving data from said most specific fact table from said analytics server; wherein said at least one query comprises querying for the change in total cost of said at least one product based on a change in any one of said measures; and wherein a clustered index of each of said independent fact tables is cached in memory for faster query processing. - View Dependent Claims (2, 3, 4, 5, 10)
-
-
6. A computer implemented method for improving efficiency and retrieval of business intelligence data used in data mining comprising:
-
normalizing the source data to produce normalized data; generating by an analytics server one or more dimensions from source data imported from a computer readable medium, wherein said one or more dimensions define categories into which portions of said normalized data can be grouped in a snowflake schema; generating by said analytics server one or more measures for each component used in producing an end product from said source data linked to said one or more dimensions in said snowflake schema;
said measures comprising rate measures, allocation measures, and hierarchical structure measures;storing by said analytics server said one or more dimensions and said one or more measures in a plurality of tables arranged in a star schema; determining by said analytics server relationship information between said one or more measures and said one or more dimensions in each of said plurality of tables; filtering said plurality of tables to generate a plurality of independent fact tables and adding said independent fact tables to a fact table pool;
each of said plurality of independent fact tables selected from a category fact table, a time aggregated fact table, and a generalized fact table;generating, from the normalized data, a master facts table containing data for two or more categories; generating into the pool, a plurality of baby fact tables, each comprising a subset of the master facts table; generating a plurality of cubes from the baby fact tables, the cubes aggregating data in the baby fact tables by at least one of the categories; aggregating data in the baby fact tables by at least one of the categories; receiving a query; upon receiving the query, searching for the most specific baby fact table available in the pool to satisfy the query; failing to find the most specific baby fact table; upon said failing to find the most specific baby fact table, recording a miss, creating a new cube and recording the cube creation time for the new cube; and based on said recording of the miss, pre-generating the most specific baby fact table for use in subsequent queries, wherein, the cube creation time using baby fact tables is smaller than the cube creation time using the master fact tables, thereby speeding up the generation of the cubes; creating additional independent fact tables based on said pool statistics and adding said additional independent fact tables to said fact table pool; determining a plurality of relationships between each component and said end product and storing said relationships in each of said plurality of tables; each relationship comprises a cost relationship as a percentage of a total cost required to produce said end product; storing by said analytics server said relationship information on said computer readable medium; calculating by said analytics server a total cost of at least one product based on said cost relationship information; submitting said at least one query by a computer system in communication with said analytics server for the change in total cost of said at least one product based on said most specific fact table; and caching a clustered index of each of said independent fact tables in memory for faster query processing. - View Dependent Claims (7, 8, 9)
-
Specification