Detecting interesting decision rules in tree ensembles
First Claim
1. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to:
- traverse each tree in a tree ensemble in order to assign each individual data record from a set of data records in an evaluation data set to an identified leaf node in a set of leaf nodes in each tree;
determine predicted values defined by the tree ensemble based on predictions provided by each leaf node to which each individual data record is assigned;
determine interesting sub-indices for decision rules from a set of decision rules corresponding to the leaf nodes in the tree ensemble, wherein the computer readable program to determine the interesting sub-indices for decision rules corresponding to the leaf nodes in the tree ensemble for categorical targets further causes the computing device to;
identify a proportion of data records P(R(t)) determined by the leaf node t where the decision rule is accurate, R(t) being the event that the decision rule based on the node t is accurate;
identify a proportion of data records P(E(t) determined by the leaf node t where the tree ensemble model is accurate, E(t) being an event that the ensemble prediction is accurate based on a data record determined by node t;
determine a proportion P(Ē
(t)) that the ensemble prediction is inaccurate based on a data record determined by node t by defining P(Ē
(t))=1−
P(E(t));
identify a proportion of data records P(E(t)R(t))determined by the node t that are predicted accurately by both the ensemble model and the decision rule;
identify a proportion of data records P(Ē
(t)(R(t)) determined by the node t that are predicted inaccurately by both the ensemble model and the decision rule;
determine a first sub-index of interestingness I1t based on prediction agreement between the ensemble model and the decision rule as I1t=P(E(t)R(t))+P(Ē
(t)R(t));
determine a second sub-index of interestingness I2t on the decision rule accuracy as I2t=P(R(t));
anddetermine a third sub-index of interestingness I3t he ensemble model accuracy as I3t=P(E(t));
for each decision rule corresponding to the leaf nodes in the tree ensemble, combine the sub-indices into interestingness index It;
rank the decision rules corresponding to the leaf nodes in the tree ensemble according to the associated value of the interestingness index It; and
report a subset of the decision rules corresponding to the leaf nodes in the tree ensemble in order to provide a notification of the interesting decision rules in the tree ensemble.
1 Assignment
0 Petitions
Accused Products
Abstract
Mechanisms are provided for detecting interesting decision rules from a set of decision rules in a tree ensemble. Each tree in the tree ensemble is traversed in order to assign each individual data record from a set of data records to an identified leaf node in each tree. Predicted values are determined for the tree ensemble based on predictions provided by each leaf node to which each individual data record is assigned. Interesting sub-indices for decision rules from the set of decision rules are determined and, for each decision rule corresponding to the leaf nodes in the tree ensemble, the sub-indices are combined into interestingness index It. The decision rules are ranked corresponding to the leaf nodes in the tree ensemble according to the associated value of the interestingness index It and a subset of the decision rules corresponding to the leaf nodes in the tree ensemble are reported.
7 Citations
10 Claims
-
1. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to:
-
traverse each tree in a tree ensemble in order to assign each individual data record from a set of data records in an evaluation data set to an identified leaf node in a set of leaf nodes in each tree; determine predicted values defined by the tree ensemble based on predictions provided by each leaf node to which each individual data record is assigned; determine interesting sub-indices for decision rules from a set of decision rules corresponding to the leaf nodes in the tree ensemble, wherein the computer readable program to determine the interesting sub-indices for decision rules corresponding to the leaf nodes in the tree ensemble for categorical targets further causes the computing device to; identify a proportion of data records P(R(t)) determined by the leaf node t where the decision rule is accurate, R(t) being the event that the decision rule based on the node t is accurate; identify a proportion of data records P(E(t) determined by the leaf node t where the tree ensemble model is accurate, E(t) being an event that the ensemble prediction is accurate based on a data record determined by node t; determine a proportion P(Ē
(t)) that the ensemble prediction is inaccurate based on a data record determined by node t by defining P(Ē
(t))=1−
P(E(t));identify a proportion of data records P(E(t)R(t))determined by the node t that are predicted accurately by both the ensemble model and the decision rule; identify a proportion of data records P(Ē
(t)(R (t)) determined by the node t that are predicted inaccurately by both the ensemble model and the decision rule;determine a first sub-index of interestingness I1t based on prediction agreement between the ensemble model and the decision rule as I1t=P(E(t)R(t))+P(Ē
(t)R (t));determine a second sub-index of interestingness I2t on the decision rule accuracy as I2t=P(R(t));
anddetermine a third sub-index of interestingness I3t he ensemble model accuracy as I3t=P(E(t)); for each decision rule corresponding to the leaf nodes in the tree ensemble, combine the sub-indices into interestingness index It; rank the decision rules corresponding to the leaf nodes in the tree ensemble according to the associated value of the interestingness index It; and report a subset of the decision rules corresponding to the leaf nodes in the tree ensemble in order to provide a notification of the interesting decision rules in the tree ensemble. - View Dependent Claims (2, 3, 4, 5)
-
-
6. An apparatus comprising:
-
a processor; and a memory coupled to the processor, wherein the memory comprises instructions which, when executed by the processor, cause the processor to; traverse each tree in a tree ensemble in order to assign each individual data record from a set of data records in an evaluation data set to an identified leaf node in a set of leaf nodes in each tree; determine predicted values defined by the tree ensemble based on predictions provided by each leaf node to which each individual data record is assigned; determine interesting sub-indices for decision rules from a set of decision rules corresponding to the leaf nodes in the tree ensemble, wherein the instructions to determine the interesting sub-indices for decision rules corresponding to the leaf nodes in the tree ensemble for categorical targets further cause the processor to; identify a proportion of data records P(R(t)) determined by the leaf node t where the decision rule is accurate, R(t) being the event that the decision rule based on the node t is accurate; identify a proportion of data records P(E(t)) determined by the leaf node t where the tree ensemble model is accurate, E(t) being an event that the ensemble prediction is accurate based on a data record determined by node t; determine a proportion P(Ē
(t)) that the ensemble prediction is inaccurate based on a data record determined by node t by defining P(Ē
(t))=1−
P(E(t));identify a proportion of data records P(E(t)R(t)) determined by the node t that are predicted accurately by both the ensemble model and the decision rule; identify a proportion of data records P(Ē
(t)R (t)) determined by the node t that are predicted accurately by both the ensemble model and the decision rule;determine a first sub-index of interestingness I1t based on prediction agreement between the ensemble model and the decision rule as I1t=P(E(t)R(t))+P(Ē
(t)R (t));determine a second sub-index of interesting I2t on the decision rule accuracy I2t=P(R(t)); and determine a third sub-index of interestingness I3t on the ensemble model accuracy as I3t=P(E(t)); for each decision rule corresponding to the leaf nodes in the tree ensemble, combine the sub-indices into interestingness index It; rank the decision rules corresponding to the leaf nodes in the tree ensemble according to the associated value of the interestingness index It; and report a subset of the decision rules corresponding to the leaf nodes in the tree ensemble in order to provide a notification of the interesting decision rules in the tree ensemble. - View Dependent Claims (7, 8, 9, 10)
-
Specification