Identifying contributors that explain differences between a data set and a subset of the data set
First Claim
1. A method for analyzing differences in an outcome between a data set for a process and a subset of the data set, the method comprising a computer system automatically performing the following:
- processing a data set containing observations of the process, the observations expressed as values for a plurality of variables and for the outcome, wherein processing the data set determines behaviors for different variable combinations with respect to the outcome, the variable combinations defined by values for one or more of the variables, the subset defined as those observations for which one or more test variables take trial values;
for pairs of a first variable combination and a second variable combination, wherein the test variables take the trial values in the second variable combination and the first variable combination is the same as the second variable combination except that the test variables are not specified as part of the first variable combination, estimating contributions of the pair to differences in the outcome between the data set and the subset, based on differences in the behaviors of the pair and also based on differences in populations of the pair; and
reporting differences in the outcome between the data set and the subset based on the estimated contributions for the variable combinations.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods for analyzing and rendering business intelligence data allow for efficient scalability as datasets grow in size. Human intervention is minimized by augmented decision making ability in selecting what aspects of large datasets should be focused on to drive key business outcomes. Variable value combinations that are predominant drivers of key observations are automatically determined from several competing variable value combinations. The identified variable value combinations can then be then used to predict future trends underlying the business intelligence data. In another embodiment, an observed outcome is decomposed into multiple contributing drivers and the impact of each of the contributing drivers can be analyzed and numerically quantified—as a static snapshot or as a time-varying evolution. Similarly, differences in observations between two groups can be decomposed into multiple contributing sub-groups for each of the groups and pairwise differences among sub-groups can be quantified and analyzed.
174 Citations
20 Claims
-
1. A method for analyzing differences in an outcome between a data set for a process and a subset of the data set, the method comprising a computer system automatically performing the following:
-
processing a data set containing observations of the process, the observations expressed as values for a plurality of variables and for the outcome, wherein processing the data set determines behaviors for different variable combinations with respect to the outcome, the variable combinations defined by values for one or more of the variables, the subset defined as those observations for which one or more test variables take trial values; for pairs of a first variable combination and a second variable combination, wherein the test variables take the trial values in the second variable combination and the first variable combination is the same as the second variable combination except that the test variables are not specified as part of the first variable combination, estimating contributions of the pair to differences in the outcome between the data set and the subset, based on differences in the behaviors of the pair and also based on differences in populations of the pair; and reporting differences in the outcome between the data set and the subset based on the estimated contributions for the variable combinations. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer program product for analyzing differences in an outcome between a data set for a process and a subset of the data set, the computer program product comprising a non-transitory machine-readable medium storing computer program code for performing a method, the method comprising:
-
processing a data set containing observations of the process, the observations expressed as values for a plurality of variables and for the outcome, wherein processing the data set determines behaviors for different variable combinations with respect to the outcome, the variable combinations defined by values for one or more of the variables, the subset defined as those observations for which one or more test variables take trial values; for pairs of a first variable combination and a second variable combination, wherein the test variables take the trial values in the second variable combination and the first variable combination is the same as the second variable combination except that the test variables are not specified as part of the first variable combination, estimating contributions of the pair to differences in the outcome between the data set and the subset, based on differences in the behaviors of the pair and also based on differences in populations of the pair; and reporting differences in the outcome between the data set and the subset based on the estimated contributions for the variable combinations.
-
Specification