Identifying contributors that explain differences between a data set and a subset of the data set

US 10,127,130 B2
Filed: 03/27/2015
Issued: 11/13/2018
Est. Priority Date: 03/18/2005
Status: Active Grant

First Claim

Patent Images

1. A method for analyzing differences in an outcome between a data set for a process and a subset of the data set, the method comprising a computer system automatically performing the following:

processing a data set containing observations of the process, the observations expressed as values for a plurality of variables and for the outcome, wherein processing the data set determines behaviors for different variable combinations with respect to the outcome, the variable combinations defined by values for one or more of the variables, the subset defined as those observations for which one or more test variables take trial values;

for pairs of a first variable combination and a second variable combination, wherein the test variables take the trial values in the second variable combination and the first variable combination is the same as the second variable combination except that the test variables are not specified as part of the first variable combination, estimating contributions of the pair to differences in the outcome between the data set and the subset, based on differences in the behaviors of the pair and also based on differences in populations of the pair; and

reporting differences in the outcome between the data set and the subset based on the estimated contributions for the variable combinations.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Methods for analyzing and rendering business intelligence data allow for efficient scalability as datasets grow in size. Human intervention is minimized by augmented decision making ability in selecting what aspects of large datasets should be focused on to drive key business outcomes. Variable value combinations that are predominant drivers of key observations are automatically determined from several competing variable value combinations. The identified variable value combinations can then be then used to predict future trends underlying the business intelligence data. In another embodiment, an observed outcome is decomposed into multiple contributing drivers and the impact of each of the contributing drivers can be analyzed and numerically quantified—as a static snapshot or as a time-varying evolution. Similarly, differences in observations between two groups can be decomposed into multiple contributing sub-groups for each of the groups and pairwise differences among sub-groups can be quantified and analyzed.

174 Citations

20 Claims

1. A method for analyzing differences in an outcome between a data set for a process and a subset of the data set, the method comprising a computer system automatically performing the following:
- processing a data set containing observations of the process, the observations expressed as values for a plurality of variables and for the outcome, wherein processing the data set determines behaviors for different variable combinations with respect to the outcome, the variable combinations defined by values for one or more of the variables, the subset defined as those observations for which one or more test variables take trial values;
  
  for pairs of a first variable combination and a second variable combination, wherein the test variables take the trial values in the second variable combination and the first variable combination is the same as the second variable combination except that the test variables are not specified as part of the first variable combination, estimating contributions of the pair to differences in the outcome between the data set and the subset, based on differences in the behaviors of the pair and also based on differences in populations of the pair; and
  
  reporting differences in the outcome between the data set and the subset based on the estimated contributions for the variable combinations.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. The method of claim 1, further comprising, for a second subset defined as those observations for which one or more second test variables take second trial values, without re-determining behaviors for different variable combinations:
    - for pairs of a first variable combination and a second variable combination, wherein the second test variables take the second trial values in the second variable combination and the first variable combination is the same as the second variable combination except that the second test variables are not specified as part of the first variable combination, estimating contributions of the pair to differences in the outcome between the data set and the second subset, based on differences in the behaviors of the pair and also based on differences in populations of the pair; and
      
      reporting differences in the outcome between the data set and the second subset based on the estimated contributions for the variable combinations.
  - 3. The method of claim 1, further comprising receiving from a user an identification of the test variables and the trial values.
  - 4. The method of claim 1, wherein the reported differences do not fully account for the difference in the outcome between the data set and the subset.
  - 5. The method of claim 1, wherein the estimated contributions for the variable combinations do not fully account for the difference in the outcome between the data set and the subset.
  - 6. The method of claim 1, further comprising recommending further analysis when the reported differences do not fully account for the difference in the outcome between the data set and the subset.
  - 7. The method of claim 1, wherein reporting differences in the outcome between the data set and the subset comprises reporting the estimated contributions for different variable combinations.
  - 8. The method of claim 1, wherein estimating contributions of the pair to differences in the outcome comprises:
    - estimating a contribution of the first variable combination by multiplying the behavior of the first variable combination by the population of the first variable combination;
      
      estimating a contribution of the second variable combination by multiplying the behavior of the second variable combination by the population of the second variable combination; and
      
      computing a difference between the estimated contributions.
  - 9. The method of claim 1, wherein reporting differences in the outcome between the data set and the subset comprises:
    - based on the estimated contributions, automatically generating an animated briefing comprising a sequence of graphs describing the reported differences.
  - 10. The method of claim 9, wherein graphs correspond to different reported differences, the graphs depicting first outcome values for the data set, second outcome values for the subset, and predicted differences in the outcome between the data set and the subset.
  - 11. The method of claim 9, wherein the animated briefing is interactive, and the sequence of graphs depends on a user'"'"'s interaction with the animated briefing.
  - 12. The method of claim 1, wherein reporting differences comprises, upon a user'"'"'s activation of a reported difference, presenting a supplementary graph explaining contributions of different variable combinations to the reported difference.
  - 13. The method of claim 12, wherein the supplementary graph is presented as a waterfall bar graph.
  - 14. The method of claim 1, wherein reporting differences in the outcome between the data set and the subset based on the estimated contributions for the variable combinations comprises first reporting a difference based on a single-variable combination for the first variable combination.
  - 15. The method of claim 14, wherein reporting differences in the outcome between the data set and the subset based on the estimated contributions for the variable combinations further comprises:
    - then reporting differences based on multiple-variable combinations contining a same variable as the single-variable combination.
  - 16. The method of claim 1, wherein behaviors for variable combinations with respect to the outcome are expressed as regression coefficients, as correlation coefficients or as net-effect impact net of all other variables.
  - 17. The method of claim 1, wherein population is expressed as counts of observations, as frequency of observations, as percentage of overall population, or as relative frequencies of observations.
  - 18. The method of claim 1, further comprising:
    - eliminating from the data set, observations for which the outcome falls outside a user-specified range.
  - 19. The method of claim 1, further comprising:
    - eliminating from the data set, observations for which one or more user-specified variables fall outside user-specified ranges.

20. A computer program product for analyzing differences in an outcome between a data set for a process and a subset of the data set, the computer program product comprising a non-transitory machine-readable medium storing computer program code for performing a method, the method comprising:
- processing a data set containing observations of the process, the observations expressed as values for a plurality of variables and for the outcome, wherein processing the data set determines behaviors for different variable combinations with respect to the outcome, the variable combinations defined by values for one or more of the variables, the subset defined as those observations for which one or more test variables take trial values;
  
  for pairs of a first variable combination and a second variable combination, wherein the test variables take the trial values in the second variable combination and the first variable combination is the same as the second variable combination except that the test variables are not specified as part of the first variable combination, estimating contributions of the pair to differences in the outcome between the data set and the subset, based on differences in the behaviors of the pair and also based on differences in populations of the pair; and
  
  reporting differences in the outcome between the data set and the subset based on the estimated contributions for the variable combinations.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Salesforce.com, Inc.
Original Assignee
Salesforce.com, Inc.
Inventors
Sengupta, Arijit, Stronger, Brad A., Chronis, Griffin
Primary Examiner(s)
Paula, Cesar B
Assistant Examiner(s)
Huang, Jian

Application Number

US14/672,026
Publication Number

US 20150205695A1
Time in Patent Office

1,327 Days
Field of Search

715255, 707758, 600300
US Class Current
CPC Class Codes

G06F 11/324   Display of status information

G06F 11/3409   for performance assessment

G06F 16/2365   Ensuring data consistency a...

G06Q 10/06   Resources, workflows, human...

G06Q 10/0637   Strategic management or ana...

G06Q 10/0639   Performance analysis of emp...

G06Q 10/067   Enterprise or organisation ...

Identifying contributors that explain differences between a data set and a subset of the data set

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

174 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

Identifying contributors that explain differences between a data set and a subset of the data set

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

174 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others