Heuristic method of classification

US 7,096,206 B2
Filed: 06/19/2001
Issued: 08/22/2006
Est. Priority Date: 06/19/2000
Status: Expired due to Term

First Claim

Patent Images

1. A computer implemented method of constructing a model configured to classify biological samples as being of one of at least a first state or a second state different than the first state, comprising:

providing a plurality of data strings, each data string being derived from a biological sample known to be of the first state or the second state;

using a genetic algorithm to select a first set of variables that identify data in each of the plurality of data strings;

calculating a sample vector for each member of the set of data strings using the first set of variables;

finding a location in a first vector space of each of at least two data clusters that best fit the sample vectors calculated using the first set of variables;

determining a variability for the at least two data clusters that best fit the sample vectors calculated using the first set of variables;

determining whether the variability of the at least two data clusters that best fit the sample vectors calculated using the first set of variables is within an acceptable tolerance;

if it is determined that the variability of the at least two data clusters that best fit the sample vectors calculated using the first set of variables is within the acceptable tolerance, providing the locations in the first vector space of the at least two data clusters that best fit the sample vectors calculated using the first set of variables; and

if it is determined that the variability of the at least two data clusters that best fit the sample vectors calculated using the first set of variables is not within the acceptable tolerance, using the genetic algorithm to select a second set of variables different than the first set of variables, calculating a sample vector for each member of the set of data strings using the second set of variables, finding a location in a second vector space of each of at least two data clusters that best fit the sample vectors calculated using the second set of variables, determining a variability for the at least two data clusters that best fit the sample vectors calculated using the second set of variables, determining whether the variability for the at least two data clusters that best fit the sample vectors calculated using the second set of variables is within the acceptable tolerance, and if it is determined that the variability of the at least two data clusters that best fit the sample vectors calculated using the second set of variables is within the acceptable tolerance, providing the locations in the second vector space of the at least two data clusters that best fit the sample vectors calculated using the second set of variables.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention concerns heuristic algorithms for the classification of Objects. A first learning algorithm comprises a genetic algorithm that is used to abstract a data stream associated with each Object and a pattern recognition algorithm that is used to classify the Objects and measure the fitness of the chromosomes of the genetic algorithm. The learning algorithm is applied to a training data set. The learning algorithm generates a classifying algorithm, which is used to classify or categorize unknown Objects. The invention is useful in the areas of classifying texts and medical samples, predicting the behavior of one financial market based on price changes in others and in monitoring the state of complex process facilities to detect impending failures.

Citations

10 Claims

1. A computer implemented method of constructing a model configured to classify biological samples as being of one of at least a first state or a second state different than the first state, comprising:
- providing a plurality of data strings, each data string being derived from a biological sample known to be of the first state or the second state;
  
  using a genetic algorithm to select a first set of variables that identify data in each of the plurality of data strings;
  
  calculating a sample vector for each member of the set of data strings using the first set of variables;
  
  finding a location in a first vector space of each of at least two data clusters that best fit the sample vectors calculated using the first set of variables;
  
  determining a variability for the at least two data clusters that best fit the sample vectors calculated using the first set of variables;
  
  determining whether the variability of the at least two data clusters that best fit the sample vectors calculated using the first set of variables is within an acceptable tolerance;
  
  if it is determined that the variability of the at least two data clusters that best fit the sample vectors calculated using the first set of variables is within the acceptable tolerance, providing the locations in the first vector space of the at least two data clusters that best fit the sample vectors calculated using the first set of variables; and
  
  if it is determined that the variability of the at least two data clusters that best fit the sample vectors calculated using the first set of variables is not within the acceptable tolerance, using the genetic algorithm to select a second set of variables different than the first set of variables, calculating a sample vector for each member of the set of data strings using the second set of variables, finding a location in a second vector space of each of at least two data clusters that best fit the sample vectors calculated using the second set of variables, determining a variability for the at least two data clusters that best fit the sample vectors calculated using the second set of variables, determining whether the variability for the at least two data clusters that best fit the sample vectors calculated using the second set of variables is within the acceptable tolerance, and if it is determined that the variability of the at least two data clusters that best fit the sample vectors calculated using the second set of variables is within the acceptable tolerance, providing the locations in the second vector space of the at least two data clusters that best fit the sample vectors calculated using the second set of variables.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- - 2. The computer implemented method of claim 1, wherein the variability of the at least two data clusters that best fit the sample vectors calculated using the first set of variables is the variance of the at least two data clusters that best fit the sample vectors calculated using the first set of variables.
  - 3. The computer implemented method of claim 1, wherein if it is determined that the variability of the at least two data clusters that best fit the sample vectors calculated using the second set of variables is not within the acceptable tolerance, selecting a third set of variables different than the first set of variables and different than the second set of variables.
  - 4. The computer implemented method of claim 1, wherein each data string is derived from a biological sample via a bio-assay technique.
  - 5. The computer implemented method of claim 1, wherein the acceptable tolerance is input by a user.
  - 6. The computer implemented method of claim 1, wherein the finding a location in a first vector space of each of the at least two data clusters that best fit the sample vectors calculated using the first set of variables includes determining for each sample vector a proximity of the sample vector to a preexisting centroid in the first vector space.
  - 7. The computer implemented method of claim 6, further comprising:
    - determining if the distance of each sample vector from the closest preexisting centroid is within a predetermined threshold distance, if the distance exceeds the threshold difference, defining a new centroid based on the location of the sample vector in the first vector space, and if the distance is less than the threshold difference, assigning the sample vector to a cluster associated with the preexisting centroid.
  - 8. The computer implemented method of claim 7, wherein the assigning the sample vector includes adjusting the location of the preexisting centroid closer to the location of the sample vector.
  - 9. The computer implemented method of claim 1, wherein each of the at least two data clusters that best fit the sample vectors calculated using the first set of variables includes a centroid and a decision hyper-radius.
  - 10. A model configured to classify biological samples constructed using the method of claim 1.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Aspira Women's Health, Inc.
Original Assignee
Correlogic Systems, Inc. (Aspira Women's Health, Inc.)
Inventors
Hitt, Ben
Primary Examiner(s)
Hirl, Joseph P.

Application Number

US09/883,196
Publication Number

US 20020046198A1
Time in Patent Office

1,890 Days
Field of Search

706/12, 706/13, 706/932
US Class Current

706/12
CPC Class Codes

G06F 18/21   Design or setup of recognit...

G06F 18/211   Selection of the most signi...

G06F 18/2433   Single-class perspective, e...

G06N 3/08   Learning methods

Y10S 706/90   Fuzzy logic

Y10S 706/932   Mathematics, science, or en...

Heuristic method of classification

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

10 Claims

Specification

Solutions

Use Cases

Quick Links

Heuristic method of classification

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

10 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links