Systems and techniques for determining the predictive value of a feature
First Claim
1. A computer-implemented method for building a predictive model, comprising:
- determining a multi-model predictive value of a feature of an initial dataset representing a prediction problem, wherein the initial dataset includes a plurality of observations and each observation includes respective values for a plurality of features, including;
(a) performing one or more predictive modeling procedures, wherein each of the predictive modeling procedures is associated with a different type of predictive model, wherein performing each modeling procedure comprises fitting the associated predictive model to the initial dataset;
(b) reducing the multi-model predictive value of the feature by shuffling values of the feature across respective observations included in the initial dataset, thereby generating a modified dataset;
(c) for each of the fitted predictive models;
(c1) determining a first accuracy score representing an accuracy with which the fitted model generates predictions for data in the initial dataset;
(c2) determining a second accuracy score representing an accuracy with which the fitted model generates predictions for data in the modified dataset in which the multi-model predictive value of the feature has been reduced; and
(c3) determining a model-specific predictive value of the feature based on the first and second accuracy scores of the fitted model; and
(d) determining, based on the model-specific predictive values of the feature, that the multi-model predictive value of the feature is low;
performing feature engineering on the initial dataset based on the multi-model predictive value of the feature, including pruning the feature having the low multi-model predictive value from the initial dataset, thereby generating a pruned dataset; and
building a predictive model for the prediction problem, including;
performing a plurality of predictive modeling procedures on the pruned dataset, selecting a fitted predictive model generated by the plurality of predictive modeling procedures, and deploying the selected predictive model to predict outcomes of the prediction problem without using the pruned feature.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for determining the predictive value of a feature may include: (a) performing predictive modeling procedures associated with respective predictive models, wherein performing each modeling procedure includes fitting the associated model to an initial dataset representing an initial prediction problem; (b) determining a first accuracy score of each of the fitted models, representing an accuracy with which the fitted model predicts an outcome of the initial prediction problem; (c) shuffling values of a feature across observations included in the initial dataset, thereby generating a modified dataset representing a modified prediction problem; (d) determining a second accuracy score of each of the fitted models, representing an accuracy with which the fitted model predicts an outcome of the modified prediction problem; and (e) determining a model-specific predictive value of the feature for each of the fitted models based on the first and second accuracy scores of the fitted model.
-
Citations
30 Claims
-
1. A computer-implemented method for building a predictive model, comprising:
-
determining a multi-model predictive value of a feature of an initial dataset representing a prediction problem, wherein the initial dataset includes a plurality of observations and each observation includes respective values for a plurality of features, including; (a) performing one or more predictive modeling procedures, wherein each of the predictive modeling procedures is associated with a different type of predictive model, wherein performing each modeling procedure comprises fitting the associated predictive model to the initial dataset; (b) reducing the multi-model predictive value of the feature by shuffling values of the feature across respective observations included in the initial dataset, thereby generating a modified dataset; (c) for each of the fitted predictive models; (c1) determining a first accuracy score representing an accuracy with which the fitted model generates predictions for data in the initial dataset; (c2) determining a second accuracy score representing an accuracy with which the fitted model generates predictions for data in the modified dataset in which the multi-model predictive value of the feature has been reduced; and (c3) determining a model-specific predictive value of the feature based on the first and second accuracy scores of the fitted model; and (d) determining, based on the model-specific predictive values of the feature, that the multi-model predictive value of the feature is low; performing feature engineering on the initial dataset based on the multi-model predictive value of the feature, including pruning the feature having the low multi-model predictive value from the initial dataset, thereby generating a pruned dataset; and building a predictive model for the prediction problem, including;
performing a plurality of predictive modeling procedures on the pruned dataset, selecting a fitted predictive model generated by the plurality of predictive modeling procedures, and deploying the selected predictive model to predict outcomes of the prediction problem without using the pruned feature. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
-
-
29. A predictive modeling apparatus comprising:
-
a memory configured to store processor-executable instructions; and a processor configured to execute the processor-executable instructions, wherein executing the processor-executable instructions causes the apparatus to perform steps including; determining a multi-model predictive value of a feature of an initial dataset representing a prediction problem, wherein the initial dataset includes a plurality of observations and each observation includes respective values for a plurality of features, including; (a) performing one or more predictive modeling procedures, wherein each of the predictive modeling procedures is associated with a different type of predictive model, wherein performing each modeling procedure comprises fitting the associated predictive model to the initial dataset; (b) reducing the multi-model predictive value of the feature by shuffling values of the feature across respective observations included in the initial dataset, thereby generating a modified dataset; (c) for each of the fitted predictive models; (c1) determining a first accuracy score representing an accuracy with which the fitted model generates predictions for data in the initial dataset; (c2) determining a second accuracy score representing an accuracy with which the fitted model generates predictions for data in the modified dataset in which the multi-model predictive value of the feature has been reduced; and (c3) determining a model-specific predictive value of the feature based on the first and second accuracy scores of the fitted model; and (d) determining, based on the model-specific predictive values of the feature, that the multi-model predictive value of the feature is low; performing feature engineering on the initial dataset based on the multi-model predictive value of the feature, including pruning the feature having the low multi-model predictive value from the initial dataset, thereby generating a pruned dataset; and building a predictive model for the prediction problem, including;
performing a plurality of predictive modeling procedures on the pruned dataset, selecting a fitted predictive model generated by the plurality of predictive modeling procedures, and deploying the selected predictive model to predict outcomes of the prediction problem without using the pruned feature.
-
-
30. An article of manufacture having computer-readable instructions stored thereon that, when executed by a processor, cause the processor to perform operations for building a predictive model, including:
-
determining a multi-model predictive value of a feature of an initial dataset representing a prediction problem, wherein the initial dataset includes a plurality of observations and each observation includes respective values for a plurality of features, including; (a) performing one or more predictive modeling procedures, wherein each of the predictive modeling procedures is associated with a different type of predictive model, wherein performing each modeling procedure comprises fitting the associated predictive model to the initial dataset; (b) reducing the multi-model predictive value of the feature by shuffling values of the feature across respective observations included in the initial dataset, thereby generating a modified dataset; (c) for each of the fitted predictive models; (c1) determining a first accuracy score representing an accuracy with which the fitted model generates predictions for data in the initial dataset; (c2) determining a second accuracy score representing an accuracy with which the fitted model generates predictions for data in the modified dataset in which the multi-model predictive value of the feature has been reduced; and (c3) determining a model-specific predictive value of the feature based on the first and second accuracy scores of the fitted model; and (d) determining, based on the model-specific predictive values of the feature, that the multi-model predictive value of the feature is low; performing feature engineering on the initial dataset based on the multi-model predictive value of the feature, including pruning the feature having the low multi-model predictive value from the initial dataset, thereby generating a pruned dataset; and building a predictive model for the prediction problem, including;
performing a plurality of predictive modeling procedures on the pruned dataset, selecting a fitted predictive model generated by the plurality of predictive modeling procedures, and deploying the selected predictive model to predict outcomes of the prediction problem without using the pruned feature.
-
Specification