Data mining technique with diversity promotion
First Claim
1. A computer-implemented data mining system, for use with a data mining training database containing training data, comprising:
- a memory storing a candidate gene database having a pool of candidate individuals, each candidate individual identifying a plurality of conditions and at least one corresponding proposed output in dependence upon the conditions, each candidate individual further having associated therewith a respective testing experience level and an indication of a respective fitness estimate,wherein the memory further identifies layer parameters for each of a plurality of gene pool experience layers L1-LT in an elitist pool, T>
1, the layer parameters for each i'"'"'th one of the layers L1-LT-1 identifying a range of testing experience [ExpMin(Li) . . . ExpMax(Li)],and wherein each ExpMin(Li)>
ExpMax(Li−
1) for i>
1;
a gene pool processor which;
tests individuals from the candidate gene pool on the training data, each individual being tested undergoing a respective battery of at least one trial, each trial applying the conditions of the respective individual to the training data to propose an output, andupdates the fitness estimate associated with each of the individuals being tested in dependence upon both the training data and the outputs proposed by the respective individual in the battery of trials; and
a gene harvesting module providing for deployment selected ones of the individuals from the gene pool,wherein the gene pool processor includes a competition module which selects individuals for discarding from the gene pool in dependence upon both their testing experience level and a diversity measure of individuals in the gene pool,and wherein the diversity measure of individuals in the gene pool comprises a first value being a diversity measure of only those individuals having an experience level within a first one of the experience layers and a second value being a diversity measure of only those individuals having an experience level within a second one of the experience layers.
3 Assignments
0 Petitions
Accused Products
Abstract
Roughly described, a computer-implemented evolutionary data mining system includes a memory storing a candidate gene database in which each candidate individual has a respective fitness estimate; a gene pool processor which tests individuals from the candidate gene pool on training data and updates the fitness estimate associated with the individuals in dependence upon the tests; and a gene harvesting module for deploying selected individuals from the gene pool, wherein the gene pool processor includes a competition module which selects individuals for discarding in dependence upon both their testing experience level and a diversity measure of individuals in the gene pool.
-
Citations
33 Claims
-
1. A computer-implemented data mining system, for use with a data mining training database containing training data, comprising:
-
a memory storing a candidate gene database having a pool of candidate individuals, each candidate individual identifying a plurality of conditions and at least one corresponding proposed output in dependence upon the conditions, each candidate individual further having associated therewith a respective testing experience level and an indication of a respective fitness estimate, wherein the memory further identifies layer parameters for each of a plurality of gene pool experience layers L1-LT in an elitist pool, T>
1, the layer parameters for each i'"'"'th one of the layers L1-LT-1 identifying a range of testing experience [ExpMin(Li) . . . ExpMax(Li)],and wherein each ExpMin(Li)>
ExpMax(Li−
1) for i>
1;a gene pool processor which; tests individuals from the candidate gene pool on the training data, each individual being tested undergoing a respective battery of at least one trial, each trial applying the conditions of the respective individual to the training data to propose an output, and updates the fitness estimate associated with each of the individuals being tested in dependence upon both the training data and the outputs proposed by the respective individual in the battery of trials; and a gene harvesting module providing for deployment selected ones of the individuals from the gene pool, wherein the gene pool processor includes a competition module which selects individuals for discarding from the gene pool in dependence upon both their testing experience level and a diversity measure of individuals in the gene pool, and wherein the diversity measure of individuals in the gene pool comprises a first value being a diversity measure of only those individuals having an experience level within a first one of the experience layers and a second value being a diversity measure of only those individuals having an experience level within a second one of the experience layers. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-implemented data mining system, for use with a data mining training database containing training data, comprising:
-
a memory storing a candidate gene database having a pool of candidate individuals, each candidate individual identifying a plurality of conditions and at least one corresponding proposed output in dependence upon the conditions, each candidate individual further having associated therewith a respective testing experience level and an indication of a respective fitness estimate, wherein the memory further identifies layer parameters for each of a plurality of gene pool experience layers L1-LT in an elitist pool, T>
1, the layer parameters for each i'"'"'th one of the layers L1-LT-1 identifying a range of testing experience [ExpMin(Li) . . . ExpMax(Li)],and wherein each ExpMin(Li)>
ExpMax(Li−
1) for i>
1;a gene pool processor which; tests individuals from the candidate gene pool on the training data, each individual being tested undergoing a respective battery of at least one trial, each trial applying the conditions of the respective individual to the training data to propose an output, and updates the fitness estimate associated with each of the individuals being tested in dependence upon both the training data and the outputs proposed by the respective individual in the battery of trials; and a gene harvesting module providing for deployment selected ones of the individuals from the gene pool, wherein the gene pool processor includes a competition module which selects individuals for discarding from the gene pool in dependence upon both their testing experience level and a diversity measure of individuals in the gene pool, and wherein in the selection of individuals for discarding, for a j'"'"'th one of the layers in the elitist pool, the gene pool processor; selects a pair of individuals in the j'"'"'th layer which the gene pool processor determines to satisfy a predetermined measure of similarity better than another pair in the j'"'"'th layer; and discards the least fit individual in the selected pair. - View Dependent Claims (9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer-implemented data mining system, for use with a data mining training database containing training data, comprising:
-
a memory storing a candidate gene database having a pool of candidate individuals, each candidate individual identifying a plurality of conditions and at least one corresponding proposed output in dependence upon the conditions, each candidate individual further having associated therewith a respective testing experience level and an indication of a respective fitness estimate, wherein the memory further identifies layer parameters for each of a plurality of gene pool experience layers L1-LT in an elitist pool, T>
1, the layer parameters for each i'"'"'th one of at least the layers L1-LT-1 identifying a range of testing experience [ExpMin(Li) . . . ExpMax(Li)], a first gene capacity quota QuotaL(Li), and a second gene capacity quota QuotaH(Li),and wherein each QuotaH(Li)>
QuotaL(Li), and wherein for i>
1, each ExpMin(Li)>
ExpMax(Li−
1);a gene pool processor which; tests individuals from the candidate gene pool on the training data, each individual being tested undergoing a respective battery of at least one trial, each trial applying the conditions of the respective individual to the training data to propose an output, and updates the fitness estimate associated with each of the individuals being tested in dependence upon both the training data and the outputs proposed by the respective individual in the battery of trials; and a gene harvesting module providing for deployment selected ones of the individuals from the gene pool, wherein the gene pool processor includes a competition module which selects individuals for discarding from the gene pool in dependence upon both their testing experience level and a diversity measure of individuals in the gene pool, and wherein in the selection of individuals for discarding, the gene pool processor; identifies each j'"'"'th layer in the elitist pool for which (a) the number of individuals in the j'"'"'th layer exceeds QuotaL(Lj), and either (b) the number of individuals in the j'"'"'th layer exceeds QuotaH(Lj) or (c) the individuals in the j'"'"'th layer fail to satisfy a predetermined measure of sufficient diversity; selects a pair of individuals in each identified j'"'"'th layer which the gene pool processor determines to satisfy a predetermined measure of similarity better than another pair in the j'"'"'th layer; and discards the least fit individual in each selected pair. - View Dependent Claims (21)
-
-
22. A computer-implemented data mining method, for use with a data mining training database containing training data, comprising the steps of:
-
providing, in a memory, a candidate gene database having a pool of candidate individuals, each candidate individual identifying a plurality of conditions and at least one corresponding proposed output in dependence upon the conditions, each candidate individual further having associated therewith a respective testing experience level and an indication of a respective fitness estimate, wherein the memory further identifies layer parameters for each of a plurality of gene pool experience layers L1-LT in an elitist pool, T>
1, the layer parameters for each i'"'"'th one of the layers L1-LT-1 identifying a range of testing experience [ExpMin(Li) . . . ExpMax(Li)],and wherein each ExpMin(Li)>
ExpMax(Li−
1) for i>
1;a computer system testing individuals from the candidate gene pool on the training data, each individual being tested undergoing a respective battery of at least one trial, each trial applying the conditions of the respective individual to the training data to propose an output; updating the fitness estimate associated with each of the individuals being tested in dependence upon both the training data and the outputs proposed by the respective individual in the battery of trials; selecting individuals for discarding from the gene pool in dependence upon both their testing experience level and a diversity measure of individuals in the gene pool; and providing for deployment selected ones of the individuals from the gene pool, wherein the diversity measure of individuals in the gene pool comprises a first value being a diversity measure of only those individuals having an experience level within a first one of the experience layers and a second value being a diversity measure of only those individuals having an experience level within a second one of the experience layers. - View Dependent Claims (23, 24)
-
-
25. A computer-implemented data mining method, for use with a data mining training database containing training data, comprising the steps of:
-
providing, in a memory, a candidate gene database having a pool of candidate individuals, each candidate individual identifying a plurality of conditions and at least one corresponding proposed output in dependence upon the conditions, each candidate individual further having associated therewith a respective testing experience level and an indication of a respective fitness estimate, wherein the memory further identifies layer parameters for each of a plurality of gene pool experience layers L1-LT in an elitist pool, T>
1, the layer parameters for each i'"'"'th one of the layers L1-LT-1 identifying a range of testing experience [ExpMin(Li) . . . ExpMax(Li)],and wherein each ExpMin(Li)>
ExpMax(Li−
1) for i>
1;a computer system testing individuals from the candidate gene pool on the training data, each individual being tested undergoing a respective battery of at least one trial, each trial applying the conditions of the respective individual to the training data to propose an output; updating the fitness estimate associated with each of the individuals being tested in dependence upon both the training data and the outputs proposed by the respective individual in the battery of trials; selecting individuals for discarding from the gene pool in dependence upon both their testing experience level and a diversity measure of individuals in the gene pool; and providing for deployment selected ones of the individuals from the gene pool, wherein the step of selecting individuals for discarding comprises, for a j'"'"'th one of the layers in the elitist pool; selecting a pair of individuals in the j'"'"'th layer which the computer system determines to satisfy a predetermined measure of similarity better than another pair in the j'"'"'th layer; and discarding the least fit individual in the selected pair. - View Dependent Claims (26, 27, 28, 29, 30, 31, 32, 33)
-
Specification