Data mining model building using attribute importance
First Claim
1. A method of selecting predictive attributes for a data mining model comprising the steps of:
- receiving a dataset having a plurality of predictor attributes;
for each predictor attribute, determining a predictive quality of the predictor attribute based on a predictor variance of the predictor attribute;
selecting at least one predictor attribute based on the determined predictive quality of the predictor attribute; and
building a data mining model including only the selected at least one predictor attribute.
1 Assignment
0 Petitions
Accused Products
Abstract
A system, method, and computer program product that uses attribute importance (AI) to reduce the time and computation resources required to build data mining models, and which provides a corresponding reduction in the cost of data mining. Attribute importance (AI) involves a process of choosing a subset of the original predictive attributes by eliminating redundant, irrelevant or uninformative ones and identifying those predictor attributes that may be most helpful in making predictions. A new algorithm Predictor Variance is proposed and a method of selecting predictive attributes for a data mining model comprises the steps of receiving a dataset having a plurality of predictor attributes, for each predictor attribute, determining a predictive quality of the predictor attribute, selecting at least one predictor attribute based on the determined predictive quality of the predictor attribute, and building a data mining model including only the selected at least one predictor attribute.
160 Citations
46 Claims
-
1. A method of selecting predictive attributes for a data mining model comprising the steps of:
-
receiving a dataset having a plurality of predictor attributes;
for each predictor attribute, determining a predictive quality of the predictor attribute based on a predictor variance of the predictor attribute;
selecting at least one predictor attribute based on the determined predictive quality of the predictor attribute; and
building a data mining model including only the selected at least one predictor attribute. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A system for selecting predictive attributes for a data mining model comprising:
-
a processor operable to execute computer program instructions;
a memory operable to store computer program instructions executable by the processor; and
computer program instructions stored in the memory and executable to perform the steps of;
receiving a dataset having a plurality of predictor attributes;
for each predictor attribute, determining a predictive quality of the predictor attribute based on a predictor variance of the predictor attribute;
selecting at least one predictor attribute based on the determined predictive quality of the predictor attribute; and
building a data mining model including only the selected at least one predictor attribute. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A computer program product for selecting predictive attributes for a data mining model, comprising:
-
a computer readable medium;
computer program instructions, recorded on the computer readable medium, executable by a processor, for performing the steps of receiving a dataset having a plurality of predictor attributes;
for each predictor attribute, determining a predictive quality of the predictor attribute based on a predictor variance of the predictor attribute;
selecting at least one predictor attribute based on the determined predictive quality of the predictor attribute; and
building a data mining model including only the selected at least one predictor attribute. - View Dependent Claims (31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44)
-
-
45. A method of determining a predictive quality of a predictor attribute for a data mining model comprising the steps of:
-
receiving a dataset having a plurality of predictor attributes, wherein the predictor attributes are conditionally independent;
for each predictor attribute, determining a predictive quality of the predictor attribute by determining a predictor variance PV according to;
wherein P is the predictor and T is the target, P has values 1 . . . m, and T has values 1 . . . n.
-
-
46. A method of determining a predictive quality of a predictor attribute for a data mining model comprising the steps of:
-
receiving a dataset having a plurality of predictor attributes, wherein the predictor attributes have at least some inter-correlations;
for each predictor attribute, determining a predictive quality of the predictor attribute by determining a variance Q of all predictors ignoring a predictor Pa according to;
determining a predictor variance PV according to;
wherein P is the predictor and T is the target, P has values 1 . . . m, and T has values 1 . . . n.
-
Specification