Methods and Systems for Determining the Importance of Individual Variables in Statistical Models
First Claim
1. A method of calculating the contribution of an individual term to a multivariate expression which includes that term, comprising:
- obtaining an original result of the multivariate expression;
modifying an individual term of the multivariate expression to an average value for a defined population, and keeping all other terms of the multivariate expression unchanged;
using a data processor, calculating a modified result of the multivariate expression using the modified individual term;
using a data processor, calculating the difference between the original and the modified result, Δ
result; and
using a data processor, outputting Δ
result to a user as the contribution of the individual term to the result of the multivariate expression.
0 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems for determining the importance of each of the variables, or combinations of variables, that contribute to the overall score generated by a predictive statistical model are presented. In a specialized case, for each variable in the model, an importance is calculated based on the calculated slope and deviance of the predictive variable. In a more general case, for each variable in the model, an importance is calculated based on setting that variable to have the average value for the data set, and then calculating the change in score. The totality of variables (or combinations thereof) is then ranked by the Δscore, or a magnitude of it, such as |Δscore|.
41 Citations
27 Claims
-
1. A method of calculating the contribution of an individual term to a multivariate expression which includes that term, comprising:
-
obtaining an original result of the multivariate expression; modifying an individual term of the multivariate expression to an average value for a defined population, and keeping all other terms of the multivariate expression unchanged; using a data processor, calculating a modified result of the multivariate expression using the modified individual term; using a data processor, calculating the difference between the original and the modified result, Δ
result; andusing a data processor, outputting Δ
result to a user as the contribution of the individual term to the result of the multivariate expression. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for contribution of an individual term to a multivariate expression which includes that term, comprising:
-
a database for storing values for various input variables; a display; and at least one data processor configured to; receive a multivariate scoring formula, said scoring formula comprising a sum of a plurality of predictive input variables each having a weighting co-efficient, values for at least some of said variables being stored in the database; calculate a score using said scoring formula and a set of input variable values; calculate a partial derivative of the scoring formula with respect to each of the input variables in said set; calculate a deviance value for each of the input variables in said set, said deviance for a variable xi=(xi−
μ
i), where pi is the mean for predictive input variable xi;calculate a contribution of one or more of the input variables in said set to the score by multiplying the partial derivative and deviance values for that variable; create a rank for each of said one or more input variables and display the value of the variable, the score and the rank of the variable to a user. - View Dependent Claims (14, 15, 16, 17)
-
-
18. A non-transitory computer readable medium containing instructions that, when executed by at least one processor of a computing device, cause the computing device to:
-
obtain an original result of the multivariate expression; modify an individual term of the multivariate expression to an average value for a defined population, and keeping all other terms of the multivariate expression unchanged; calculate a modified result of the multivariate expression using the modified individual term; calculate the difference between the original and the modified result, Δ
result; andoutput Δ
result to a user as the contribution of the individual term to the result of the multivariate expression. - View Dependent Claims (19, 20, 21)
-
-
22. A method for dealing with potential collinearity of variables in a multivariate expression of N variables, comprising:
-
partitioning the set of variables into a set of mutually exclusive and completely exhaustive M variable clusters; mathematically creating composite indices to summarize all of the variables within a cluster into a single composite measure; performing a regression analysis to approximate an output of the multivariate expression as a combination of the composite indices; rank ordering by absolute value of the composite indices and their co-efficients; and outputting the combination of composite indices and the ranked order to a user.
-
- 23. The method of claim 23, wherein the composite indices are substantially independent.
Specification