Data mining application with improved data mining algorithm selection
First Claim
1. A data mining algorithm selection method for selecting a data mining algorithm for data mining analysis of a problem set, the data mining algorithm selection method comprising:
- providing data to be analyzed by data mining;
providing a training database comprising a list of data mining algorithm instances, each data mining algorithm instance comprising a data mining algorithm description and a set of training metafeatures characterizing probability density functions of features;
extracting features that classify the data, the frequency of the occurrence of features with respect to datum in the data defining a case probability density function;
calculating metafeatures describing the case probability density function; and
selecting a data mining algorithm by using the training database to map the calculated metafeatures describing the case probability density function to the selected data mining algorithm.
2 Assignments
0 Petitions
Accused Products
Abstract
A training database (including data mining algorithm descriptions and metafeatures characterizing probability density functions of features) in the memory and computer readable program code (i) to extract features that classify data, (ii) to calculate metafeatures describing the case probability density function, and (iii) to select a data mining algorithm by using the training database to map the calculated metafeatures describing the case probability density function to the selected data mining algorithm. The frequency of the occurrence of features with respect to datum in the data defining a case probability density function.
140 Citations
60 Claims
-
1. A data mining algorithm selection method for selecting a data mining algorithm for data mining analysis of a problem set, the data mining algorithm selection method comprising:
-
providing data to be analyzed by data mining;
providing a training database comprising a list of data mining algorithm instances, each data mining algorithm instance comprising a data mining algorithm description and a set of training metafeatures characterizing probability density functions of features;
extracting features that classify the data, the frequency of the occurrence of features with respect to datum in the data defining a case probability density function;
calculating metafeatures describing the case probability density function; and
selecting a data mining algorithm by using the training database to map the calculated metafeatures describing the case probability density function to the selected data mining algorithm. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60)
-
-
13. A data mining product embedded in a computer readable medium, comprising:
-
at least one computer readable medium having a training database embedded therein and having a computer readable program code embedded therein to select a data mining algorithm, the training database comprising a list of data mining algorithm instances, each data mining algorithm instance comprising a data mining algorithm description and a set of metafeatures characterizing probability density functions of features;
the computer readable program code comprising;
computer readable program code to extract features that classify data, the frequency of the occurrence of features with respect to datum in the data defining a probability density function;
computer readable program code to calculate metafeatures describing the probability density function;
computer readable program code to select a data mining algorithm by using the training database to map the calculated metafeatures describing the probability density function to the selected data mining algorithm.
-
-
25. A data mining system with improved data mining algorithm selection for data mining analysis of data, the data mining system comprising:
-
a general purpose computer comprising a memory and a central processing unit;
a training database in the memory, the comprising a list of data mining algorithm instances, each data mining algorithm instance comprising a data mining algorithm description and a set of metafeatures characterizing probability density functions of features;
computer readable program code to extract features that classify data, the frequency of the occurrence of features with respect to datum in the data defining a case probability density function;
computer readable program code to calculate metafeatures describing the case probability density function; and
computer readable program code to select a data mining algorithm by using the training database to map the calculated metafeatures describing the case probability density function to the selected data mining algorithm.
-
-
37. A data mining system with improved data mining algorithm selection for data mining analysis of data, the data mining system comprising:
-
a distributed network of computers;
a training database on the network, the training database comprising a list of data mining algorithm instances, each data mining algorithm instance comprising a data mining algorithm description and a set of metafeatures characterizing probability density functions of features;
computer readable program code to extract features that classify data, the frequency of the occurrence of features with respect to datum in the data defining a case probability density function; and
computer readable program code to calculate metafeatures describing the case probability density function;
computer readable program code to select a data mining algorithm by using the training database to map the calculated metafeatures describing the case probability density function to the selected data mining algorithm.
-
-
49. A data mining application with improved data mining algorithm selection for data mining analysis of a problem set, the data mining application comprising:
-
a training database means for storing a list of data mining algorithm instances, each data mining algorithm instance comprising a data mining algorithm description and a set of metafeatures characterizing probability density function of features over a problem data set;
a means for extracting features that classify problem set data, wherein the frequency of the occurrence of features with respect to datum in the problem data set defines a probability density function;
a means for computing metafeatures describing the probability density function; and
a means for directly mapping the metafeatures describing the probability density function to a selected data mining algorithm using the training database means.
-
Specification