Heuristic method of classification

US 7,240,038 B2
Filed: 11/15/2005
Issued: 07/03/2007
Est. Priority Date: 06/19/2000
Status: Expired due to Fees

First Claim

Patent Images

1. A method for creating a model for classifying a biological sample as being of a first state or a second state different than the first state, comprising:

obtaining a data string derived from each biological sample of a set known to be of the first state and a set known to be of the second state;

selecting data elements from each data string using an evolutionary algorithm;

determining the locations of a first set of vectors and a second set of vectors in a vector space, each vector of the first set of vectors corresponding to data elements derived from a biological sample known to be of the first state, each vector of the second set of vectors corresponding to data elements derived from a biological sample known to be of the second state; and

identifying a model acceptable for classifying biological samples containing at least one cluster disposed within the vector space, the cluster containing at least one of the vectors of the first set of vectors and being associated with the first state for purposes of classifying a biological sample.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention concerns heuristic algorithms for the classification of Objects. A first learning algorithm comprises a genetic algorithm that is used to abstract a data stream associated with each Object and a pattern recognition algorithm that is used to classify the Objects and measure the fitness of the chromosomes of the genetic algorithm. The learning algorithm is applied to a training data set. The learning algorithm generates a classifying algorithm, which is used to classify or categorize unknown Objects. The invention is useful in the areas of classifying texts and medical samples, predicting the behavior of one financial market based on price changes in others and in monitoring the state of complex process facilities to detect impending failures.

83 Citations

View as Search Results

32 Claims

1. A method for creating a model for classifying a biological sample as being of a first state or a second state different than the first state, comprising:
- obtaining a data string derived from each biological sample of a set known to be of the first state and a set known to be of the second state;
  
  selecting data elements from each data string using an evolutionary algorithm;
  
  determining the locations of a first set of vectors and a second set of vectors in a vector space, each vector of the first set of vectors corresponding to data elements derived from a biological sample known to be of the first state, each vector of the second set of vectors corresponding to data elements derived from a biological sample known to be of the second state; and
  
  identifying a model acceptable for classifying biological samples containing at least one cluster disposed within the vector space, the cluster containing at least one of the vectors of the first set of vectors and being associated with the first state for purposes of classifying a biological sample.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the evolutionary algorithm is a genetic algorithm.
  - 3. The method of claim 1, wherein the biological sample is selected from the group of biological samples consisting of serum, plasma, and biopsy specimen.
  - 4. The method of claim 1, wherein identifying a model uses a pattern recognition algorithm.
  - 5. The method of claim 1, wherein the data strings are of a type are selected from the group consisting of:
    - (a) mass spectrometry data, (b) hybridization data, (c) gene expression data, and (d) microarray data.
  - 6. The method of claim 1, wherein the acceptability of the model for classifying biological samples is based on the homogeneity of the cluster.
  - 7. A software product having a model constructed using the method of claim 1.
  - 8. A model constructed using the method of claim 1.

9. A method of creating a classifying pattern for objects using a plurality of data strings, each data string associated with one of a plurality of objects to be classified, comprising:
- selecting a set of data elements from each data string using a learning algorithm;
  
  classifying the set of data elements using a classifying algorithm; and
  
  repeating the selecting and classifying with a different set of data elements selected from each data string until a classifying pattern is created that is acceptable to classify the objects.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)
- - 10. The method of claim 9, wherein the learning algorithm is an evolutionary algorithm.
  - 11. The method of claim 9, wherein the learning algorithm is a genetic algorithm.
  - 12. The method of claim 9, wherein the data strings are produced by a high-throughput assay.
  - 13. The method of claim 9, wherein the data strings are of a type are selected from the group consisting of:
    - (a) mass spectrometry data, (b) hybridization data, (c) gene expression data, (d) microarray data, (e) financial data, (f) stock market data, (g) text, (h) currency exchange rates, and (i) processing plant control status values.
  - 14. The method of claim 9, wherein the objects are known to be of a first state or a second state and the model classifies objects by state.
  - 15. The method of claim 9, wherein the objects are biological samples.
  - 16. The method of claim 15, wherein classification of a sample provides information about a state selected from the group consisting of medical diagnosis and pathology.
  - 17. The method of claim 15, wherein the sample is selected from the group of biological samples consisting of:
    - serum, plasma, and biopsy specimen.
  - 18. The method of claim 9, wherein the classifying algorithm is an adaptive pattern recognition algorithm.
  - 19. The method of claim 18, wherein the pattern recognition algorithm creates a cluster map having a plurality of clusters associated with the set of data points.
  - 20. The method of claim 19, wherein the acceptability of a grouping as a model to classify the objects is based on the homogeneity of the clusters in the cluster map.
  - 21. The method of claim 20, wherein the model is the best lead cluster map.
  - 22. The method of claim 19, wherein the grouping is acceptable as a model to classify the objects if a homogeneity of the cluster map is within a predetermined tolerance.
  - 23. The method of claim 19, wherein the cluster map is created bycalculating a vector for each set of data points;
    - andmapping the vectors into a vector space.
  - 24. The method of claim 23, further comprising:
    - determining if a distance of at least one of the vectors from a closest preexisting centroid is within a predetermined threshold distance.
  - 25. The method of claim 24, further comprising:
    - assigning the vector to a cluster associated with the preexisting centroid if the distance is within the predetermined threshold distance, and the assigning the vector includes adjusting the location of the preexisting centroid closer to the location of the vector.
  - 26. The method of claim 24, wherein if the distance exceeds the predetermined threshold difference, defining a new centroid based on the location of the vector in the vector space, and if the distance is less than the threshold difference, assigning the vector to a cluster associated with the preexisting centroid.
  - 27. A software product having a model constructed using the method of claim 9.
  - 28. A model constructed using the method of claim 9.

29. A method of constructing a model configured to classify objects as being of one of at least a first state and a second state different than the first state, comprising:
- receiving a plurality of data strings, each data string being derived from an object known to be of the first state or the second state;
  
  selecting a first set of variables that correspond with data in each of the plurality of data strings;
  
  calculating a vector for each of the plurality of data strings using the first set of variables;
  
  finding a location in a first vector space of each of at least two data clusters that best fit the vectors calculated using the first set of variables;
  
  providing the locations in the first vector space of the at least two data clusters;
  
  determining a variability for the at least two data clusters that best fit the vectors calculated using the first set of variables;
  
  determining whether the variability of the at least two data clusters that best fit the vectors calculated using the first set of variables is within an acceptable tolerance;
  
  if it is determined that the variability of the at least two data clusters that best fit the vectors calculated using the first set of variables is not within the acceptable tolerance,using an evolutionary algorithm to select a second set of variables different than the first set of variables,calculating a vector for each of the plurality of data strings using the second set of variables,finding a location in a second vector space of each of at least two data clusters that best fit the vectors calculated using the second set of variables,determining a variability for the at least two data clusters that best fit the vectors calculated using the second set of variables,determining whether the variability for the at least two data clusters that best fit the vectors calculated using the second set of variables is within the acceptable tolerance, andif it is determined that the variability of the at least two data clusters that best fit the vectors calculated using the second set of variables is within the acceptable tolerance, providing the locations in the second vector space of the at least two data clusters that best fit the vectors calculated using the second set of variables.
- View Dependent Claims (30, 31, 32)
- - 30. A model configured to classify objects as being of one of at least a first state and a second state different than the first state constructed using the method of claim 29.
  - 31. A software product having a model constructed using the method of claim 29.
  - 32. The method of claim 29, wherein the data strings are of a type are selected from the group consisting of:
    - (a) mass spectrometry data, (b) hybridization data, (c) gene expression data, (d) microarray data, (e) financial data, (f) stock market data, (g) text, (h) currency exchange rates, and (i) processing plant control status values.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Vermillion, Inc.
Original Assignee
Correlogic Systems, Inc. (Aspira Women's Health, Inc.)
Inventors
Hitt, Ben
Primary Examiner(s)
Hirl; Joseph P

Application Number

US11/273,432
Publication Number

US 20060112041A1
Time in Patent Office

595 Days
Field of Search

706/900, 706/14, 706/12, 600/316
US Class Current

706/12
CPC Class Codes

G06F 18/21   Design or setup of recognit...

G06F 18/211   Selection of the most signi...

G06F 18/2433   Single-class perspective, e...

G06N 3/08   Learning methods

Y10S 706/90   Fuzzy logic

Y10S 706/932   Mathematics, science, or en...

Heuristic method of classification

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

83 Citations

32 Claims

Specification

Solutions

Use Cases

Quick Links

Heuristic method of classification

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

83 Citations

32 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links