Multivariate insight discovery approach
First Claim
1. A method comprising:
- identifying, by the at least one processor, data types of a raw dataset and models of data of the raw dataset to determine attribute hierarchies;
generating, by the at least one processor, a reduced dataset from the raw dataset based on the determined attribute hierarchies by;
mapping, by the at least one processor, the attribute hierarchies to identify sets of equivalent attributes and measures;
for each set of equivalent attributes, selecting, by the at least one processor, one of the equivalent attributes and discarding the remaining equivalent attributes; and
for each set of equivalent attributes, selecting, by the at least one processor, one of the equivalent measures and discarding the remaining equivalent measures;
aggregating over at least one attribute of the reduced dataset, by the at least one processor, to generate a preprocessed dataset with the same relevant statistical properties of the raw dataset, such that at least one type of statistical analysis produces the same results when applied to the preprocessed dataset as when applied to the raw dataset;
identifying, by the at least one processor, subsets of the preprocessed dataset that include data that exhibits non-random patterns by performing the at least one type of statistical analysis;
generating a score for each of the identified subsets of the preprocessed dataset, by the at least one processor, based on the data that exhibits non-random patterns included in each of the identified subsets;
ranking each of the identified subsets for presence of non-random data structures, by the at least one processor, based on the score generated for each of the identified subsets;
selecting, by the at least one processor, an identified subset based on the ranking of the identified subset; and
generating, by the at least one processor, a visualization that highlights a non-random structure of the selected identified subset.
1 Assignment
0 Petitions
Accused Products
Abstract
A raw dataset including measures and dimensions is processed, by a preprocessing module, using an algorithm that produces a preprocessed dataset such that at least one type of statistical analysis of the preprocessed dataset yields equal results to the same type of statistical analysis of the raw dataset. The preprocessed dataset is then analyzed by a statistical analysis module to identify subsets of the preprocessed dataset that include a non-random structure or pattern. The analysis of the preprocessed dataset includes the at least one type of statistical analysis that produces the same results for both the preprocessed and raw datasets. The identified subsets are then ranked by a statistical ranker based on the analysis of the preprocessed dataset and a subset is selected for visualization based on the rankings. A visualization module then generates a visualization of the selected identified subset that highlights a non-random structure of the selected subset.
44 Citations
20 Claims
-
1. A method comprising:
-
identifying, by the at least one processor, data types of a raw dataset and models of data of the raw dataset to determine attribute hierarchies; generating, by the at least one processor, a reduced dataset from the raw dataset based on the determined attribute hierarchies by; mapping, by the at least one processor, the attribute hierarchies to identify sets of equivalent attributes and measures; for each set of equivalent attributes, selecting, by the at least one processor, one of the equivalent attributes and discarding the remaining equivalent attributes; and for each set of equivalent attributes, selecting, by the at least one processor, one of the equivalent measures and discarding the remaining equivalent measures; aggregating over at least one attribute of the reduced dataset, by the at least one processor, to generate a preprocessed dataset with the same relevant statistical properties of the raw dataset, such that at least one type of statistical analysis produces the same results when applied to the preprocessed dataset as when applied to the raw dataset; identifying, by the at least one processor, subsets of the preprocessed dataset that include data that exhibits non-random patterns by performing the at least one type of statistical analysis; generating a score for each of the identified subsets of the preprocessed dataset, by the at least one processor, based on the data that exhibits non-random patterns included in each of the identified subsets; ranking each of the identified subsets for presence of non-random data structures, by the at least one processor, based on the score generated for each of the identified subsets; selecting, by the at least one processor, an identified subset based on the ranking of the identified subset; and generating, by the at least one processor, a visualization that highlights a non-random structure of the selected identified subset. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A system comprising:
-
one or more processors; and a machine-readable storage medium storing a set of instructions that, when executed by the one or more processors, cause the system to perform operations comprising; identifying data types of a raw dataset and models of data of the raw dataset to determine attribute hierarchies; generating a reduced dataset from the raw dataset based on the determined attribute hierarchies by; mapping the attribute hierarchies to identify sets of equivalent attributes and measures; for each set of equivalent attributes, selecting one of the equivalent attributes and discarding the remaining equivalent attributes; and for each set of equivalent attributes, selecting one of the equivalent measures and discarding the remaining equivalent measures; aggregating over at least one attribute of the reduced dataset to generate a preprocessed dataset with the same relevant statistical properties of the raw dataset, such that at least one type of statistical analysis produces the same results when applied to each of the preprocessed dataset and the raw dataset; identifying subsets of the preprocessed dataset that include data that exhibits non-random patterns by performing the at least one type of statistical analysis; generating a score for each of the identified subsets of the preprocessed dataset based on the data that exhibits the non-random patterns included in each of the identified subsets; ranking each of the identified subsets for presence of non-random data structures, based on the score generated for each of the identified subsets; selecting an identified subset based on the ranking of the identified subset; and generating a visualization of the selected identified subset that highlights a non-random structure of the selected identified subset. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A non-transitory machine-readable storage medium including instructions that, when executed on at least one processor of a machine, cause the machine to perform operations comprising:
-
identifying data types of a raw dataset and models of data of the raw dataset to determine attribute hierarchies; generating a reduced dataset from the raw dataset based on the determined attribute hierarchies by; mapping the attribute hierarchies to identify sets of equivalent attributes and measures; for each set of equivalent attributes, selecting one of the equivalent attributes and discarding the remaining equivalent attributes; and for each set of equivalent attributes, selecting one of the equivalent measures and discarding the remaining equivalent measures; aggregating over at least one attribute of the reduced dataset to generate a preprocessed dataset with the same relevant statistical properties of the raw dataset, such that at least one type of statistical analysis produces the same results when applied to each of the preprocessed dataset and the raw dataset; identifying subsets of the preprocessed dataset that include data that exhibits non-random patterns by performing the at least one type of statistical analysis; generating a score for each of the identified subsets of the preprocessed dataset based on the data that exhibits the non-random patterns included in each of the identified subsets; ranking each of the identified subsets for presence of non-random data structures, based on the score generated for each of the identified subsets; selecting an identified subset based on the ranking of the identified subset; and generating a visualization of the selected identified subset that highlights a non-random structure of the selected identified subset. - View Dependent Claims (17, 18, 19, 20)
-
Specification