×

Detecting interesting decision rules in tree ensembles

  • US 10,394,779 B2
  • Filed: 09/14/2015
  • Issued: 08/27/2019
  • Est. Priority Date: 09/14/2015
  • Status: Active Grant
First Claim
Patent Images

1. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program, when executed on a computing device, causes the computing device to:

  • traverse each tree in a tree ensemble in order to assign each individual data record from a set of data records in an evaluation data set to an identified leaf node in a set of leaf nodes in each tree;

    determine predicted values defined by the tree ensemble based on predictions provided by each leaf node to which each individual data record is assigned;

    determine interesting sub-indices for decision rules from a set of decision rules corresponding to the leaf nodes in the tree ensemble, wherein the computer readable program to determine the interesting sub-indices for decision rules corresponding to the leaf nodes in the tree ensemble for categorical targets further causes the computing device to;

    identify a proportion of data records P(R(t)) determined by the leaf node t where the decision rule is accurate, R(t) being the event that the decision rule based on the node t is accurate;

    identify a proportion of data records P(E(t) determined by the leaf node t where the tree ensemble model is accurate, E(t) being an event that the ensemble prediction is accurate based on a data record determined by node t;

    determine a proportion P(Ē

    (t)) that the ensemble prediction is inaccurate based on a data record determined by node t by defining P(Ē

    (t))=1−

    P(E(t));

    identify a proportion of data records P(E(t)R(t))determined by the node t that are predicted accurately by both the ensemble model and the decision rule;

    identify a proportion of data records P(Ē

    (t)(R(t)) determined by the node t that are predicted inaccurately by both the ensemble model and the decision rule;

    determine a first sub-index of interestingness I1t based on prediction agreement between the ensemble model and the decision rule as I1t=P(E(t)R(t))+P(Ē

    (t)R(t));

    determine a second sub-index of interestingness I2t on the decision rule accuracy as I2t=P(R(t));

    anddetermine a third sub-index of interestingness I3t he ensemble model accuracy as I3t=P(E(t));

    for each decision rule corresponding to the leaf nodes in the tree ensemble, combine the sub-indices into interestingness index It;

    rank the decision rules corresponding to the leaf nodes in the tree ensemble according to the associated value of the interestingness index It; and

    report a subset of the decision rules corresponding to the leaf nodes in the tree ensemble in order to provide a notification of the interesting decision rules in the tree ensemble.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×