Methods of identifying patterns in biological systems and uses thereof

DC CAFC

US 7,117,188 B2
Filed: 01/24/2002
Issued: 10/03/2006
Est. Priority Date: 05/01/1998
Status: Expired due to Term

- Alert
- Pin

First Claim

Patent Images

1. A computer-implemented method for identifying patterns in data, the method comprising:

(a) inputting into at least one support vector machine of a plurality of support vector machines a training set having known outcomes, the at least one support vector machine comprising a decision function having a plurality of weights, each having a weight value, wherein the training set comprises features corresponding to the data and wherein each feature has a corresponding weight;

(b) optimizing the plurality of weights so that classifier error is minimized;

(c) computing ranking criteria using the optimized plurality of weights;

(d) eliminating at least one feature corresponding to the smallest ranking criterion;

(e) repeating steps (a) through (d) for a plurality of iterations until a subset of features of pre-determined size remains; and

(f) inputting into the at least one support vector machine a live set of data wherein the features within the live set are selected according to the subset of features.

View all claims

5 Assignments

Timeline View

Assignment View

Litigations

2 Petitions

Accused Products

Abstract

The methods, systems and devices of the present invention comprise use of Support Vector Machines and RFE (Recursive Feature Elimination) for the identification of patterns that are useful for medical diagnosis, prognosis and treatment. SVM-RFE can be used with varied data sets.

98 Citations

View as Search Results

23 Claims

1. A computer-implemented method for identifying patterns in data, the method comprising:
- (a) inputting into at least one support vector machine of a plurality of support vector machines a training set having known outcomes, the at least one support vector machine comprising a decision function having a plurality of weights, each having a weight value, wherein the training set comprises features corresponding to the data and wherein each feature has a corresponding weight;
  
  (b) optimizing the plurality of weights so that classifier error is minimized;
  
  (c) computing ranking criteria using the optimized plurality of weights;
  
  (d) eliminating at least one feature corresponding to the smallest ranking criterion;
  
  (e) repeating steps (a) through (d) for a plurality of iterations until a subset of features of pre-determined size remains; and
  
  (f) inputting into the at least one support vector machine a live set of data wherein the features within the live set are selected according to the subset of features.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, wherein the at least one support vector machine is a soft margin support vector machine.
  - 3. The method of claim 1, wherein the ranking criterion corresponding to a feature is calculated by squaring the optimized weight for the corresponding feature.
  - 4. The method of claim 1, wherein the decision function is a quadratic function.
  - 5. The method of claim 1, wherein step (d) comprises eliminating a plurality of features corresponding to the smallest ranking criteria in a single iteration of steps (a) through (d).
  - 6. The method of claim 1, wherein step (d) comprises eliminating a plurality of features corresponding to the smallest ranking criteria in at least the first iteration of steps (a) through (d) and in later iterations, eliminating one feature for each iteration.
  - 7. The method of claim 1, wherein step (d) comprises eliminating a plurality of features corresponding to the smallest ranking criteria so that the number of features is reduced by a factor of two for each iteration.
  - 8. The method of claim 1, wherein the training set and the live set each comprise gene expression data obtained from DNA micro-arrays.
  - 9. The method of claim 1, further comprising pre-processing the training set and the live set so that the features are comparably scaled.
  - 10. The method of claim 1, wherein step (e) further comprises using a new support vector machine for each iteration.
  - 11. The method of claim 1, further comprising the steps of:
    - pre-processing the training data set using unsupervised clustering to generate a plurality of data clusters;
      
      selecting a cluster center from each of a plurality of data clusters;
      
      using the cluster centers to perform steps (b) to (e).
  - 12. The method of claim 1, further comprising, after step (e), post-processing the optimum subset of features to generate a plurality of clusters, wherein each feature in the optimum subset of features is a cluster center.

13. A computer-implemented method for identifying determinative genes for use in diagnosis, prognosis or treatment of a disease, the method comprising:
- (a) inputting into a support vector machine a training data set of gene expression data having known outcomes with respect to the disease, the support vector machine comprising a decision function having a plurality of weights, each having a weight value, wherein the training set comprises features corresponding to the gene expression data and each feature has a corresponding weight;
  
  (b) optimizing the plurality of weights so that classifier error is minimized;
  
  (c) computing ranking criteria using the optimized plurality of weights;
  
  (d) eliminating at least one feature corresponding to the smallest ranking criterion;
  
  (e) repeating steps (a) through (d) for a plurality of iterations until an optimum subset of features remains; and
  
  (f) inputting into the support vector machine a live data set of gene expression data wherein the features within the live data set are selected according to the optimum subset of features.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The method of claim 13, wherein step (d) comprises eliminating a plurality of features corresponding to the smallest ranking criteria in a single iteration of steps (a) through (d).
  - 15. The method of claim 13, wherein step (d) comprises eliminating a plurality of features corresponding to the smallest ranking criteria in at least the first iteration of steps (a) through (d) and in later iterations, eliminating one feature for each iteration.
  - 16. The method of claim 13, wherein step (d) comprises eliminating a plurality of features corresponding to the smallest ranking criteria so that the number of features is reduced by a factor of two for each iteration.
  - 17. The method of claim 13, wherein step (e) further comprises using a new support vector machine for each iteration.
  - 18. The method of claim 13, further comprising pre-processing the training set to decrease skew in the data distribution.

19. A computer-implemented method for identifying patterns in biological data, the method comprising:
- (a) inputting into at least some of a plurality of support vector machines a training data set, wherein the training data set comprises features corresponding to the biological data and each feature has a corresponding weight, and wherein each support vector machine comprises a decision function having a plurality of weights;
  
  (b) optimizing the plurality of weights so that classification confidence is optimized;
  
  (c) computing ranking criteria using the optimized plurality of weights;
  
  (d) eliminating at least one feature corresponding to the smallest ranking criteria;
  
  (e) repeating steps (a) through (d) for a plurality of iterations until an optimum subset of features remains; and
  
  (f) inputting into the plurality of support vector machines a live set of biological data wherein the features within the live set are selected according to the optimum subset of features.
- View Dependent Claims (20, 21, 22, 23)
- - 20. The method of claim 19, wherein step (e) further comprises using a new support vector machine for each iteration.
  - 21. The method of claim 19, wherein step (d) comprises eliminating a plurality of features corresponding to the smallest ranking criteria in a single iteration of steps (a) through (d).
  - 22. The method of claim 19, wherein step (d) comprises eliminating a plurality of features corresponding to the smallest ranking criteria in at least the first iteration of steps (a) through (d) and in later iterations, eliminating one feature for each iteration.
  - 23. The method of claim 19, wherein step (d) comprises eliminating a plurality of features corresponding to the smallest ranking criteria so that the number of features is reduced by a factor of two for each iteration.

Specification

Resources

Litigation Campaign Assessment

Litigation Data

Current Assignee
Curtis Anderson, Health Discovery Corporation, James Roberts, Joe Mckenzie, Jules B. Paderewski, Julian N. Stern, Memorial Health Systems Incorporated, Timothy P. O'Hayer
Original Assignee
Health Discovery Corporation
Inventors
Guyon, Isabelle, Weston, Jason Aaron Edward
Primary Examiner(s)
DAVIS, GEORGE B

Application Number

US10/057,849
Publication Number

US 20030172043A1
Time in Patent Office

1,713 Days
Field of Search

706/20, 706/19
US Class Current

706/20
CPC Class Codes

G06F 18/2115   by evaluating different sub...

G06F 18/2411   based on the proximity to a...

G06N 20/00   Machine learning

G06N 20/10   using kernel methods, e.g. ...

G16B 25/00   ICT specially adapted for h...

G16B 25/10   Gene or protein expression ...

G16B 40/00   ICT specially adapted for b...

G16B 40/20   Supervised data analysis

G16B 40/30   Unsupervised data analysis

Y02A 90/10   Information and communicati...

Methods of identifying patterns in biological systems and uses thereof

First Claim

5 Assignments

Litigations

2 Petitions

Accused Products

Abstract

98 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

Methods of identifying patterns in biological systems and uses thereof

First Claim

5 Assignments

Subscription Required

Subscription Required

Litigations

2 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

98 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links