OPTIMIZED DECISION TREE BASED MODELS
First Claim
Patent Images
1. A system, comprising:
- one or more computing devices configured to;
identify one or more run-time optimization goals for a decision-tree based machine learning model to be trained using a data set, including at least a goal for a memory footprint of an execution of the machine learning model subsequent to a training phase of the machine learning model;
store, in a depth-first order at one or more persistent storage devices during a tree-construction pass of the training phase, respective representations of a plurality of nodes generated for a particular decision tree using at least a portion of the data set;
determine, for one or more nodes of the particular decision tree during the tree-construction pass, a respective value of a predictive utility metric (PUM), wherein a particular PUM value associated with a particular node of the one or more nodes is a measure of an expected contribution of the particular node to a prediction generated using the machine learning model;
generate, during a tree-pruning pass of the training phase, a modified version of the particular decision tree, wherein to generate the modified version, at least the particular node is removed from the particular decision tree, wherein the particular node is selected for removal based at least in part on the one or more run-time optimization goals and based at least in part on the particular PUM value;
store a representation of the modified version of the particular decision tree; and
subsequent to the training phase, execute the machine learning model using at least the modified version of the particular decision tree to obtain a particular prediction.
1 Assignment
0 Petitions
Accused Products
Abstract
During a training phase of a machine learning model, representations of at least some nodes of a decision tree are generated and stored on persistent storage in depth-first order. A respective predictive utility metric (PUM) value is determined for one or more nodes, indicating expected contributions of the nodes to a prediction of the model. A particular node is selected for removal from the tree based at least partly on its PUM value. A modified version of the tree, with the particular node removed, is stored for obtaining a prediction.
-
Citations
21 Claims
-
1. A system, comprising:
one or more computing devices configured to; identify one or more run-time optimization goals for a decision-tree based machine learning model to be trained using a data set, including at least a goal for a memory footprint of an execution of the machine learning model subsequent to a training phase of the machine learning model; store, in a depth-first order at one or more persistent storage devices during a tree-construction pass of the training phase, respective representations of a plurality of nodes generated for a particular decision tree using at least a portion of the data set; determine, for one or more nodes of the particular decision tree during the tree-construction pass, a respective value of a predictive utility metric (PUM), wherein a particular PUM value associated with a particular node of the one or more nodes is a measure of an expected contribution of the particular node to a prediction generated using the machine learning model; generate, during a tree-pruning pass of the training phase, a modified version of the particular decision tree, wherein to generate the modified version, at least the particular node is removed from the particular decision tree, wherein the particular node is selected for removal based at least in part on the one or more run-time optimization goals and based at least in part on the particular PUM value; store a representation of the modified version of the particular decision tree; and subsequent to the training phase, execute the machine learning model using at least the modified version of the particular decision tree to obtain a particular prediction. - View Dependent Claims (2, 3, 4, 5)
-
6. A method, comprising:
performing, by one or more computing devices; storing, in a depth-first order at one or more persistent storage devices during a tree-construction pass of a training phase of a machine learning model, respective representations of a plurality of nodes generated for a particular decision tree; determining, for one or more nodes of the particular decision tree, a respective value of a predictive utility metric (PUM), wherein a particular PUM value associated with a particular node of the one or more nodes is a measure of an expected contribution of the particular node to a prediction generated using the machine learning model; generating, during a tree-pruning pass of the training phase, a modified version of the particular decision tree, wherein said generating comprises removing at least the particular node from the particular decision tree, wherein the particular node is selected for removal based at least in part on the particular PUM value; and executing the machine learning model using at least the modified version of the particular decision tree to obtain a particular prediction. - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15)
-
16. A non-transitory computer-accessible storage medium storing program instructions that when executed on one or more processors:
-
store, in a depth-first order at one or more persistent storage devices during a first tree-construction period of one or more tree-construction periods of a training phase of a machine learning model, respective representations of a plurality of nodes generated for a particular decision tree; determine, for one or more nodes of the particular decision tree, a respective value of a predictive utility metric (PUM), wherein a particular PUM value associated with a particular node of the one or more nodes is a measure of an expected contribution of the particular node to a prediction generated using the machine learning model; select, during a first tree-pruning period of one or more tree-pruning periods of the training phase, the particular node for removal from the particular decision tree based at least in part on the particular PUM value; and store a modified version of the particular decision tree, wherein the modified version excludes the particular node. - View Dependent Claims (17, 18, 19, 20, 21)
-
Specification