Multivariate Insight Discovery Approach
First Claim
1. A method comprising:
- accessing a dataset including measures and dimensions by a preprocessing module including at least one processor;
processing the dataset, by the preprocessing module, to generate a preprocessed dataset such that at least one type of statistical analysis of the preprocessed dataset yields equal results to the same type of statistical analysis of the dataset;
analyzing the preprocessed dataset, by a statistical analysis module including at least one processor, to identify subsets of the preprocessed dataset that include a non-random structure, the analyzing including the at least one type of statistical analysis;
generating a score for each of the identified subsets, by the statistical analysis module, based on the non-random structures included in each of the identified subsets;
ranking each of the identified subsets, by a statistical ranking module including at least one processor, based on the score generated for each of the identified subsets and selecting an identified subset based on the ranking of the identified subset; and
generating, by a visualization module including at least one processor, a visualization that highlights a non-random structure of the selected identified subset.
1 Assignment
0 Petitions
Accused Products
Abstract
A raw dataset including measures and dimensions is processed, by a preprocessing module, using an algorithm that produces a preprocessed dataset such that at least one type of statistical analysis of the preprocessed dataset yields equal results to the same type of statistical analysis of the raw dataset. The preprocessed dataset is then analyzed by a statistical analysis module to identify subsets of the preprocessed dataset that include a non-random structure or pattern. The analysis of the preprocessed dataset includes the at least one type of statistical analysis that produces the same results for both the preprocessed and raw datasets. The identified subsets are then ranked by a statistical ranker based on the analysis of the preprocessed dataset and a subset is selected for visualization based on the rankings A visualization module then generates a visualization of the selected identified subset that highlights a non-random structure of the selected subset.
-
Citations
20 Claims
-
1. A method comprising:
-
accessing a dataset including measures and dimensions by a preprocessing module including at least one processor; processing the dataset, by the preprocessing module, to generate a preprocessed dataset such that at least one type of statistical analysis of the preprocessed dataset yields equal results to the same type of statistical analysis of the dataset; analyzing the preprocessed dataset, by a statistical analysis module including at least one processor, to identify subsets of the preprocessed dataset that include a non-random structure, the analyzing including the at least one type of statistical analysis; generating a score for each of the identified subsets, by the statistical analysis module, based on the non-random structures included in each of the identified subsets; ranking each of the identified subsets, by a statistical ranking module including at least one processor, based on the score generated for each of the identified subsets and selecting an identified subset based on the ranking of the identified subset; and generating, by a visualization module including at least one processor, a visualization that highlights a non-random structure of the selected identified subset. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system comprising:
-
a preprocessing module including a processor and configured to access a dataset including measures and dimensions and process the dataset to generate a preprocessed dataset such that at least one type of statistical analysis of the preprocessed dataset yields equal results to the same type of statistical analysis of the dataset; a statistical analysis module including a processor and configured to; analyze the preprocessed dataset to identify subsets of the preprocessed dataset that include a non-random structure, the analyzing including the at least one type of statistical analysis; and generate a score for each of the identified subsets based on the non-random structures included in each of the identified subsets; a statisitical ranking module including a processor and configured to; rank each of the identified subsets based on the score generated for each of the identified subsets; and select an identified subset based on the ranking of the identified subset; and a visualization module including a processor and configured to generate a visualization of the selected identified subset that highlights a non-random structure of the selected identified subset. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory machine-readable storage medium including instructions that, when executed on at least one processor of a machine, cause the machine to perform operations comprising:
-
accessing a dataset including measures and dimensions by a preprocessing module; processing the dataset, by the preprocessing module, to generate a preprocessed dataset such that at least one type of statistical analysis of the preprocessed dataset yields equal results to the same type of statistical analysis of the dataset; analyzing the preprocessed dataset, by a statistical analysis module, to identify subsets of the preprocessed dataset that include a non-random structure, the analyzing including the at least one type of statistical analysis; generating a score for each of the identified subsets, by the statistical analysis module, based on the non-random structures included in each of the identified subsets; ranking each of the identified subsets, by a statistical ranking, based on the score generated for each of the identified subsets and selecting an identified subset based on the ranking of the identified subset; and generating, by a visualization module, a visualization that highlights a non-random structure of the selected identified subset. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification