Method and apparatus to model the variables of a data set
First Claim
1. A method of modeling the variables in an input data set by means of a probabilistic network including data nodes and causal links, said input data set containing statistical data relating to at least a portion of a population group, the variables being personal to customers of a business enterprise, the method comprising the steps of:
- registering the input data set;
generating a population of genomes each individually modeling the input data set by means of chromosome data to represent the data nodes in a probabilistic network and the causal links between the data nodes;
performing a crossover operation between the chromosome data of parent genomes in the population to generate offspring genomes;
performing an addition operation to add the offspring genomes to the population;
performing a scoring operation on genomes in the population to derives scores representing the correspondence between the genomes and the input data set;
performing a selecting operation to select genomes from the population according to the scores;
repeating the crossover, scoring, addition and selecting operations for a plurality of generations of the genomes;
selecting, as an output model, a genome from the last generation; and
forming predictions relating to future behavior of said population group.
7 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to modeling the variables of a data set by means of a robabilistic network including data nodes and causal links. The term ‘probabilistic networks’ includes Bayesian networks, belief networks, causal networks and knowledge maps. The variables of an input data set are registered and a population of genomes is generated each of which individually models the input data set. Each genome has a chromosome to represent the data nodes in a probabilistic network and a chromosome to represent the causal links between the data nodes. A crossover operation is performed between the chromosome data of parent genomes in the population to generate offspring genomes. The offspring genomes are then added to the genome population. A scoring operation is performed on genomes in the said population to derive scores representing the correspondence between the genomes and the input data. Genomes are selected from the population according to their scores and the crossover, scoring, addition and selecting operations for a plurality of generations of the genomes. Finally a genome is selected from the last generation according to the best score. A mutation operation may be performed on the genomes. The mutation may consist of the addition or deletion of a data node and the addition or deletion of a causal link.
40 Citations
28 Claims
-
1. A method of modeling the variables in an input data set by means of a probabilistic network including data nodes and causal links, said input data set containing statistical data relating to at least a portion of a population group, the variables being personal to customers of a business enterprise, the method comprising the steps of:
-
registering the input data set;
generating a population of genomes each individually modeling the input data set by means of chromosome data to represent the data nodes in a probabilistic network and the causal links between the data nodes;
performing a crossover operation between the chromosome data of parent genomes in the population to generate offspring genomes;
performing an addition operation to add the offspring genomes to the population;
performing a scoring operation on genomes in the population to derives scores representing the correspondence between the genomes and the input data set;
performing a selecting operation to select genomes from the population according to the scores;
repeating the crossover, scoring, addition and selecting operations for a plurality of generations of the genomes;
selecting, as an output model, a genome from the last generation; and
forming predictions relating to future behavior of said population group. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. Apparatus for modeling the variables in an input data set by means of a probabilistic network including data nodes and causal links, said input data set containing statistical data relating to a population group, the variables being personal to customers of a business enterprise, the apparatus comprising:
-
data register means for registering the input data set;
generating means for generating a population of genomes each individually modeling the input data set by means of chromosome data to represent the data nodes in a probabilistic network and the causal links between the data nodes;
crossover means for performing a crossover operation between the chromosome data of parent genomes in the population to generate offspring genomes;
adding means for performing an addition operation to add the offspring genomes to the population;
scoring means for performing a scoring operation on genomes in the population to derive scores representing the correspondence between the genomes and the input data set;
selecting means for performing a selecting operation to select genomes from the population according to the scores;
control means for controlling the crossover, scoring, addition and selecting means to repeat their operations for a plurality of generations of the genomes;
output means for selecting, as an output model, a genome from the last generation; and
means for forming predictions relating to future behavior of said population group. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22)
-
-
23. A method of data mining employing modeling the variables in an input data set to enable the reduction of the knowledge about relationships between the variables by means of a probabilistic network including data nodes and causal links, said input data set containing statistical data relating to at least a portion of a population group, the method comprising the steps of:
-
registering the input data set for a plurality of customers of an enterprise;
generating a population of genomes each individually modeling the input data set by means of chromosome data to represent the data nodes in a probabilistic network and the causal links between the data nodes;
performing a crossover operation between the chromosome data of parent genomes in the population to generate offspring genomes;
performing an addition operation to add the offspring genomes to the population;
performing a scoring operation on genomes in the population to derive scores representing the correspondence between the genomes and the input data set;
performing a selecting operation to select genomes from the population according to the scores;
repeating the crossover, scoring, addition and selecting operations for a plurality of generations of the genomes;
selecting, as an output model, a genome from the last generation; and
forming predictions relating to future behavior of said population group to mine data from the input data set. - View Dependent Claims (24, 25)
-
-
26. Apparatus for data mining employing modeling the variables in an input data set to enable the reduction of the knowledge about relationships between the variables by means of a probabilistic network including data nodes and causal links, said input data set containing statistical data relating to a population group, the apparatus comprising:
-
data register means for registering the input data set for a plurality of customers of an enterprise;
generating means for generating a population of genomes each individually modeling the input data set by means of chromosome data to represent the data nodes in a probabilistic network and the causal links between the data nodes;
crossover means for performing a crossover operation between the chromosome data of parent genomes in the population to generate offspring genomes;
adding means for performing an addition operation to add the offspring genomes to the population;
scoring means for performing a scoring operation on genomes in the population to derive scores representing the correspondence between the genomes and the input data set;
selecting means for performing a selecting operation to select genomes from the population according to the scores;
control means for controlling the crossover, scoring, addition and selecting means to repeat their operations for a plurality of generations of the genomes;
output means for selecting, as an output model, a genome from the last generation; and
means for forming predictions relating to fixture behavior of said population group to mine data from the input data set. - View Dependent Claims (27, 28)
-
Specification