Distributed data warehouse
First Claim
1. One or more computer-storage media storing computer-useable instructions that, when executed by a computing device, mine data, the instructions causing the computing device to:
- identify definitions defining which measures and dimensions are desired for inclusion in a plurality of auxiliary data structures, an auxiliary data structure representing groupings of measures and dimensions of interest from a common fact table;
aggregate the definitions for the auxiliary data structures in a centralized data location, the definition of each dimension being associated with at least one of the measures;
process one or more initial data files to extract measure values of the measures and dimension keys of the dimensions using the aggregated definitions;
construct the common fact table from the extracted measure values and the extracted dimension keys from the processed one or more initial data files;
construct one or more dimension tables corresponding to the one or more dimensions, the one or more dimension tables being stored separately from the common fact table;
create the auxiliary data structures from the common fact table, each auxiliary data structure comprising the identified definitions desired for inclusion in the auxiliary data structure, each auxiliary data structure including a different subset of the measures and the dimensions of the common fact table, a first user being associated with a first subset of the auxiliary data structures;
receive a user data query from the first user, the user data query comprising a combination of a measure and a dimension;
identify an auxiliary data structure from the auxiliary data structures that includes the combination of the measure and the dimension, the identified auxiliary data structure being different from the first subset of auxiliary data structures;
generate a responsive result to the user data query based on the identified auxiliary data structure; and
provide the generated responsive result to the first user.
1 Assignment
0 Petitions
Accused Products
Abstract
Methods and data structures are provided for allowing data mining with improved efficiency. During processing of a usage log (or multiple logs) for an activity, such as a usage logfile of network search activity, a common fact table is generated. The common fact table allows a plurality of auxiliary data structures to be formed from the common fact table. These auxiliary data structures are designed to allow users to submit queries against the contents of the data structure in order to investigate the data. The efficiency of access of the common fact table is improved by allowing users to access auxiliary data structures other than the auxiliary data structures that are associated with a user. Optionally, the common fact table and/or the auxiliary data structures can include dimension values that correspond to both pre-identified dimension values as well as dimension values that are identified during processing of the activity logfiles.
-
Citations
26 Claims
-
1. One or more computer-storage media storing computer-useable instructions that, when executed by a computing device, mine data, the instructions causing the computing device to:
-
identify definitions defining which measures and dimensions are desired for inclusion in a plurality of auxiliary data structures, an auxiliary data structure representing groupings of measures and dimensions of interest from a common fact table; aggregate the definitions for the auxiliary data structures in a centralized data location, the definition of each dimension being associated with at least one of the measures; process one or more initial data files to extract measure values of the measures and dimension keys of the dimensions using the aggregated definitions; construct the common fact table from the extracted measure values and the extracted dimension keys from the processed one or more initial data files; construct one or more dimension tables corresponding to the one or more dimensions, the one or more dimension tables being stored separately from the common fact table; create the auxiliary data structures from the common fact table, each auxiliary data structure comprising the identified definitions desired for inclusion in the auxiliary data structure, each auxiliary data structure including a different subset of the measures and the dimensions of the common fact table, a first user being associated with a first subset of the auxiliary data structures; receive a user data query from the first user, the user data query comprising a combination of a measure and a dimension; identify an auxiliary data structure from the auxiliary data structures that includes the combination of the measure and the dimension, the identified auxiliary data structure being different from the first subset of auxiliary data structures; generate a responsive result to the user data query based on the identified auxiliary data structure; and provide the generated responsive result to the first user. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A computer-implemented method for mining data, comprising:
-
identifying definitions defining which measures and dimensions are desired for inclusion in auxiliary data structures, an auxiliary data structure representing groupings of measures and dimensions of interest from a common fact table; aggregating the definitions for the auxiliary data structures, the definitions comprising a plurality of managed dimension values for at least one dimension; processing one or more initial data files to extract values for the measures and the dimensions using the aggregated definitions, the extracted values including one or more unmanaged dimension values for the at least one dimension; validating the one or more unmanaged dimension values; constructing the common fact table from the extracted values of the measures and dimension keys of the dimensions from the processed one or more initial data files; constructing one or more dimension tables corresponding to the plurality of dimensions based on the extracted values, the one or more dimension tables being stored separately from the common fact table; creating the auxiliary data structures from the common fact table, each auxiliary data structure including a different subset of the measures and the dimensions of the common fact table, the subset of the measures and the dimensions of the common fact table corresponding to the measures and dimensions of interest of the auxiliary data structure, at least one auxiliary dimension table including a dimension having validated unmanaged dimension values; receiving a user data query, the user data query comprising one or more combinations of measures and dimensions; generating a responsive result to the user data query based on at least one of the auxiliary data structures; and providing the generated responsive result. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer-implemented method for mining data, comprising:
-
identifying definitions defining which measures and dimensions are desired for inclusion in auxiliary data structures, an auxiliary data structure representing groupings of measures and dimensions of interest from a common fact table; aggregating the definitions for the auxiliary data structures, the definitions comprising a plurality of managed dimension values for at least one dimension, the definition of each dimension being associated with at least one measure from the plurality of measures; processing one or more initial data files to extract values for the measures and the dimensions using the aggregated definitions, the extracted values including one or more unmanaged dimension values for the at least one dimension, each of the unmanaged dimension values being an unexpected value extracted in the processing, and each of the managed dimension values being a pre-identified value extracted in the processing; validating the one or more unmanaged dimension values as being proper values for the at least one dimension; constructing the common fact table from the extracted values of the measures and dimension keys of the dimensions from the processed one or more initial data files; constructing one or more dimension tables corresponding to the plurality of dimensions based on the extracted values, the one or more dimension tables being stored separately from the common fact table; creating the auxiliary data structures from the common fact table, each auxiliary data structure including a different subset of the extracted values of the measures from the common fact table and the dimensions, the different subset of the extracted values of the measures from the common fact table and the dimensions corresponding to the measures and dimensions of interest of the auxiliary data structure, at least one auxiliary dimension table including a dimension having validated unmanaged dimension values, a first user being associated with a first subset of the auxiliary data structures; receiving a user data query from the first user, the user data query including a combination of a measure and a dimension; identifying an auxiliary data structure from the auxiliary data structures that includes the combination of the measure and the dimension, the identified auxiliary data structure being different from the first subset of the auxiliary data structures; and generating a responsive result to the user data query based on the identified auxiliary data structure.
-
-
21. A system for mining data, comprising:
-
a processing script configured to control processing of initial data files using aggregated definitions of auxiliary data structures to construct a common fact table, an auxiliary data structure representing a subset of measures and dimensions of interest from the common fact table, the aggregated definitions defining which measures and dimensions are desired for inclusion in the auxiliary data structures, the definition of each dimension being associated with at least one measure from the plurality of measures, the common fact table being constructed from measure values of the measures and dimension keys for the dimensions based on the measures and the dimensions being defined as desired for inclusion in the auxiliary data structures in the aggregated definitions using the processing script; one or more dimension tables corresponding to the one or more dimensions, the one or more dimension tables stored separately from the common fact table; the auxiliary data structures constructed from the common fact table, each auxiliary data structure including a different subset of the measures and the dimensions of the common fact table, the different subset of the measures and the dimensions of the common fact table corresponding to the measures and dimensions of interest of the auxiliary data structure, a first user being associated with a first subset of the auxiliary data structures; a data query processing component configured to; receive a user data query from the first user, the user data query comprising a combination of a measure and a dimension; identify an auxiliary data structure from the auxiliary data structures that includes the combination of the measure and the dimension, the identified auxiliary data structure being different from the first subset of auxiliary data structures; generate a responsive result to the user data query based on the identified auxiliary data structure; and provide the generated responsive result to the first user. - View Dependent Claims (22, 23, 24, 25, 26)
-
Specification