Heuristic method of classification
First Claim
1. A computer implemented method of constructing a model configured to classify biological samples as being of one of at least a first state or a second state different than the first state, comprising:
- providing a plurality of data strings, each data string being derived from a biological sample known to be of the first state or the second state;
using a genetic algorithm to select a first set of variables that identify data in each of the plurality of data strings;
calculating a sample vector for each member of the set of data strings using the first set of variables;
finding a location in a first vector space of each of at least two data clusters that best fit the sample vectors calculated using the first set of variables;
determining a variability for the at least two data clusters that best fit the sample vectors calculated using the first set of variables;
determining whether the variability of the at least two data clusters that best fit the sample vectors calculated using the first set of variables is within an acceptable tolerance;
if it is determined that the variability of the at least two data clusters that best fit the sample vectors calculated using the first set of variables is within the acceptable tolerance, providing the locations in the first vector space of the at least two data clusters that best fit the sample vectors calculated using the first set of variables; and
if it is determined that the variability of the at least two data clusters that best fit the sample vectors calculated using the first set of variables is not within the acceptable tolerance, using the genetic algorithm to select a second set of variables different than the first set of variables, calculating a sample vector for each member of the set of data strings using the second set of variables, finding a location in a second vector space of each of at least two data clusters that best fit the sample vectors calculated using the second set of variables, determining a variability for the at least two data clusters that best fit the sample vectors calculated using the second set of variables, determining whether the variability for the at least two data clusters that best fit the sample vectors calculated using the second set of variables is within the acceptable tolerance, and if it is determined that the variability of the at least two data clusters that best fit the sample vectors calculated using the second set of variables is within the acceptable tolerance, providing the locations in the second vector space of the at least two data clusters that best fit the sample vectors calculated using the second set of variables.
4 Assignments
0 Petitions
Accused Products
Abstract
The invention concerns heuristic algorithms for the classification of Objects. A first learning algorithm comprises a genetic algorithm that is used to abstract a data stream associated with each Object and a pattern recognition algorithm that is used to classify the Objects and measure the fitness of the chromosomes of the genetic algorithm. The learning algorithm is applied to a training data set. The learning algorithm generates a classifying algorithm, which is used to classify or categorize unknown Objects. The invention is useful in the areas of classifying texts and medical samples, predicting the behavior of one financial market based on price changes in others and in monitoring the state of complex process facilities to detect impending failures.
-
Citations
10 Claims
-
1. A computer implemented method of constructing a model configured to classify biological samples as being of one of at least a first state or a second state different than the first state, comprising:
-
providing a plurality of data strings, each data string being derived from a biological sample known to be of the first state or the second state;
using a genetic algorithm to select a first set of variables that identify data in each of the plurality of data strings;
calculating a sample vector for each member of the set of data strings using the first set of variables;
finding a location in a first vector space of each of at least two data clusters that best fit the sample vectors calculated using the first set of variables;
determining a variability for the at least two data clusters that best fit the sample vectors calculated using the first set of variables;
determining whether the variability of the at least two data clusters that best fit the sample vectors calculated using the first set of variables is within an acceptable tolerance;
if it is determined that the variability of the at least two data clusters that best fit the sample vectors calculated using the first set of variables is within the acceptable tolerance, providing the locations in the first vector space of the at least two data clusters that best fit the sample vectors calculated using the first set of variables; and
if it is determined that the variability of the at least two data clusters that best fit the sample vectors calculated using the first set of variables is not within the acceptable tolerance, using the genetic algorithm to select a second set of variables different than the first set of variables, calculating a sample vector for each member of the set of data strings using the second set of variables, finding a location in a second vector space of each of at least two data clusters that best fit the sample vectors calculated using the second set of variables, determining a variability for the at least two data clusters that best fit the sample vectors calculated using the second set of variables, determining whether the variability for the at least two data clusters that best fit the sample vectors calculated using the second set of variables is within the acceptable tolerance, and if it is determined that the variability of the at least two data clusters that best fit the sample vectors calculated using the second set of variables is within the acceptable tolerance, providing the locations in the second vector space of the at least two data clusters that best fit the sample vectors calculated using the second set of variables. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
Specification