Gene expression programming algorithm
First Claim
1. A genetic algorithm based method for determining a mathematical expression that describes a set of real world data, the method comprising the steps of:
- generating an initial population of representations of mathematical expressions that include a plurality of prime numbers; and
evolving a succession of generations derived from the initial population.
2 Assignments
0 Petitions
Accused Products
Abstract
A gene expression programming genetic algorithm for performing symbolic regression is provided. The algorithm avoids expression bloating and over fitting by employing a fitness function that depends inversely on the mathematical expression complexity. Members of a population that are evolved by the algorithm are represented as a set arrays (e.g., in the form of a matrix) of indexes that reference operands and operators, thus facilitating selection, mutation, and cross over operations conducted in the course of evolving the population. The algorithm comprises a syntax checking part that may be applied to population members without their having to be converted to executable programs first. An object-oriented programming language data structure is providing for encapsulating basic data for each codon (e.g., operand, operator) used by the algorithm.
52 Citations
31 Claims
-
1. A genetic algorithm based method for determining a mathematical expression that describes a set of real world data, the method comprising the steps of:
-
generating an initial population of representations of mathematical expressions that include a plurality of prime numbers; and
evolving a succession of generations derived from the initial population. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A genetic algorithm based method for determining a mathematical expression that describes a set of real world data, the method comprising the steps of:
-
recursively generating a succession of populations of representations of mathematical expressions, wherein each representation includes;
identifications of one or more operators selected from a set of operators; and
identifications of one or more operands selected from a set of operands that includes constants and variables;
wherein each operand in the set of operands, and each operator in the set of operators is associated with a cost;
for each kth population in the succession of populations generating a set of programs that embody mathematical expressions represented in the kth population of representations;
deriving a first measure of fitness of each program in the set of programs by a process including the step of;
running the program on one or more sets of test data and comparing output values to predetermined values to obtain a measure of residual error;
deriving a second measure of fitness of each program in the set of programs by a process including the step of;
summing the cost of each operator and operand in the program; and
selectively duplicating individual representations of mathematical expressions in each successive generation based on the first and second measures of fitness. - View Dependent Claims (8)
-
-
9. A genetic algorithm based method for determining a mathematical expression that describes a set of real world data, the method comprising the steps of:
-
representing each of a succession of populations of mathematical expressions as a set of arrays of indexes, wherein each array represents a mathematical expression and one or more indexes in the set of arrays of indexes point to representations of expression elements selected from a group of elements consisting of operators and operands;
for each kth population of the succession of populations;
generating a program that embodies the mathematical expressions represented by each of one or more of the set of arrays in the population;
deriving a measure of fitness of each program;
selecting one or more high fitness arrays from the set of arrays based on the measure of fitness for carry over to a (k+1)th generation;
performing a cross over operation between one or more of the high fitness arrays carried over to the (k+1)th generation. - View Dependent Claims (10, 11, 12)
-
-
13. A genetic algorithm based method for determining a mathematical expression that describes a set of real-world data, the method comprising the steps of:
-
recursively generating a succession of generations of a population of sequences of symbols wherein each sequence of symbols represents a mathematical expression;
for each generation determining if each sequence of symbols can be translated into a viable program by;
for each sequence of symbols initializing a required length variable;
starting with a first symbol of each sequence, and for asuccession of kth symbols within each sequence;
increasing the required length variable by a number of symbols necessitated by syntax rules to form a viable operation with the kth symbol;
in the case that the required length variable after incrementing is equal to k reporting valid syntax; and
in the case that the required length variable when added to k exceeds a maximum sequence length reporting invalid syntax. - View Dependent Claims (14, 15)
-
-
16. A computer readable medium containing programming instructions for determining a mathematical expression that describes a set of real world data, including programming instructions for:
-
generating an initial population of representations of mathematical expressions that include a plurality of prime numbers; and
evolving a succession of generations derived from the initial population. - View Dependent Claims (17, 18, 19, 20, 21)
-
-
22. A computer readable medium containing programming instructions for determining a mathematical expression that describes a set of real world data, including programming instructions for:
-
recursively generating a succession of populations of representations of mathematical expressions, wherein each representation includes;
identifications of one or more operators selected from a set of operators; and
identifications of one or more operands selected from a set of operands that includes constants and variables;
wherein each operand in the set of operands, and each operator in the set of operators is associated with a cost;
for each kth population in the succession of populations generating a set of programs that embody mathematical expressions represented in the kth population of representations;
deriving a first measure of fitness of each program in the set of programs by a process including the step of;
running the program on one or more sets of test data and comparing output values to predetermined values to obtain a measure of residual error;
deriving a second measure of fitness of each program in the set of programs by a process including the step of;
summing the cost of each operator and operand in the program; and
selectively duplicating individual representations of mathematical expressions in each successive generation based on the first and second measures of fitness. - View Dependent Claims (23)
-
-
24. A computer readable medium containing programming instructions for determining a mathematical expression that describes a set of real world data, including programming instructions for:
-
representing each of a succession of populations of mathematical expressions as a set of arrays of indexes, wherein each array represents a mathematical expression and one or more indexes point to representations of an expression element selected from a group of elements consisting of operators and operands;
for each kth population of the succession of populations;
generating a program that embodies the mathematical expressions represented by each array in the population;
deriving a measure of fitness of each generated program selecting one or more high fitness arrays based on the measure of fitness for carry over to a (k+1)th generation;
performing a cross over operation between one or more of the high fitness arrays carried over to the (k+1)th generation. - View Dependent Claims (25, 26, 27)
-
-
28. A computer readable medium containing programming instructions for determining a mathematical expression that describes a set of real world data, including programming instructions for:
-
recursively generating a succession of generations of a population of sequences of symbols wherein each sequence of symbols represents a mathematical expression;
for each generation determining if each sequence of symbols can be translated into a viable program by;
for each sequence of symbols initializing a required length variable;
starting with a first symbol of each sequence, and for a succession of kth symbols within each sequence;
increasing the required length variable by a number of symbols necessitated by syntax rules to form a viable operation with the kth symbol;
in the case that the required length variable after incrementing is equal to k reporting valid syntax; and
in the case that the required length variable when added to k exceeds a maximum sequence length reporting invalid syntax. - View Dependent Claims (29, 30)
-
-
31. A computer readable medium having stored there on a data structure comprising:
an objected oriented programming language class including;
a codon name instance variable;
a codon type instance variable for storing a first number that is indicative of a number of codons that are to be associated with a codon specified in the codon name instance variable;
a codon value instance variable for storing a value associated with the codon specified in the codon name instance variable;
a codon cost instance variable for storing a second number that is indicative of a degree to which an operator or operand specified by the codon specified in the codon name instance variable increase complexity of a mathematical expression; and
a codon index instance variable for storing an index associated with each instance of the object-oriented programming language class.
Specification