Enhancing biological knowledge discovery using multiples support vector machines

US 6,760,715 B1
Filed: 08/07/2000
Issued: 07/06/2004
Est. Priority Date: 05/01/1998
Status: Expired due to Term

- Alert
- Pin

Associated Cases

Associated Defendants

First Claim

Patent Images

1. A method for enhancing knowledge discovered from biological data using multiple support vector machines comprising:

pre-processing a training data set to add meaning to each of a plurality of training data points;

training each of a plurality of support vector machines using the pre-processed training data set, each support vector machine comprising a different kernel;

pre-processing a test data set to add meaning to each of a plurality of test data points;

testing each of the plurality of trained support vector machines using the preprocessed test data set; and

in response to receiving each of the test outputs from each of the plurality of trained support vector machines, comparing each of the test outputs with each other to determine which if any of the test output is an optimal solution.

View all claims

8 Assignments

Timeline View

Assignment View

Litigations

0 Petitions

Accused Products

Abstract

Multiple support vector machines are used to extract useful information from vast quantities of biological data. The method includes pre-processing of training data and test data to add dimensionality or to identify missing or erroneous data points. The training data is used to train the learning machine after which the success of the training is tested using the test data. The test output is pre-processed to determine whether the knowledge discovered from the pre-processed test data set is desirable and to identify which of the multiple support vector machines provides the optimal solution. After the training has been confirmed, live biological data can be pre-processed then input into the identified support vector machine that provides the optimal solution for extraction of useful information.

108 Citations

View as Search Results

22 Claims

1. A method for enhancing knowledge discovered from biological data using multiple support vector machines comprising:
- pre-processing a training data set to add meaning to each of a plurality of training data points;
  
  training each of a plurality of support vector machines using the pre-processed training data set, each support vector machine comprising a different kernel;
  
  pre-processing a test data set to add meaning to each of a plurality of test data points;
  
  testing each of the plurality of trained support vector machines using the preprocessed test data set; and
  
  in response to receiving each of the test outputs from each of the plurality of trained support vector machines, comparing each of the test outputs with each other to determine which if any of the test output is an optimal solution.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- - 2. The method of claim 1, wherein each training data point comprises a vector having one or more coordinates;
    - and
3. The method of claim 2, wherein cleaning the training data point comprises deleting, repairing or replacing the data point.
4. The method of claim 1, wherein each training data point comprises a vector having one or more original coordinates;
- andwherein pre-processing the training data set to add meaning to each training data point comprises adding dimensionality to each training data point by adding one or more new coordinates to the vector.
5. The method of claim 4, wherein the one or more new coordinates added to the vector are derived by applying a transformation to one or more of the original coordinates.
6. The method of claim 5, wherein the transformation is based on expert knowledge.
7. The method of claim 5, wherein the transformation is computationally derived.
8. The method of claim 1, wherein the training data set comprises a continuous variable;
- andwherein the transformation comprises optimally categorizing the continuous variable of the training data set.
9. The method of claim 1, wherein comparing each of the test outputs with each other comprises:
- post-processing each of the test outputs by interpreting each of the test outputs into a common format;
  
  comparing each of the post-processed test outputs with each other to determine which of the test outputs represents a lowest global minimum error.
10. The method of claim 1, wherein the knowledge to be discovered from the data relates to a regression or density estimation;
- wherein each support vector machine produces a training output comprising a continuous variable; and
  
  wherein the method further comprises the step of post-processing each of the training outputs by optimally categorizing the training output to derive cutoff points in the continuous variable.
11. The method of claim 1, further comprising the steps of:
- in response to comparing each of the test outputs with each other, determining that none of the test outputs is the optimal solution;
  
  adjusting the different kernels of one or more of the plurality of support vector machines; and
  
  in response to adjusting the selection of the different kernels, retraining and retesting each of the plurality of support vector machines.
12. The method of claim 11, wherein adjusting the different kernels is performed based on prior performance or historical data and is dependant on the nature of the knowledge to be discovered from the data or the nature of the data.
13. The method of claim 1, further comprising the steps of:
- in response to comparing each of the test outputs with each other, determining that a selected one of the test outputs is the optimal solution, the selected one of the test outputs produced by a selected one of the plurality of trained support vector machines comprising a selected kernel;
  
  collecting a live biological data set;
  
  pre-processing the live biological data set to add meaning to each of a plurality of live biological data points;
  
  inputting the pre-processed live data set into the selected trained support vector machine comprising the selected kernel; and
  
  receiving the live output of the selected trained support vector machine.
14. The method of claim 13, further comprising the step of post-processing the live output by interpreting the live output into a computationally derived alphanumerical classifier.
15. The method of claim 1, further comprising the steps of:
- in response to comparing each of the test outputs with each other, determining that a selected one of the test outputs is the optimal solution, the selected one of the test outputs produced by a selected one of the plurality of trained support vector machines comprising a selected kernel;
  
  collecting a live biological data set;
  
  pre-processing the live biological data set to add meaning to each of a plurality of live biological data points;
  
  configuring two or more of the plurality of support vector machines for parallel processing based on the selected kernel;
  
  inputting the pre-processed live data set into the support vector machines configured for parallel processing; and
  
  receiving the live output of the trained support vector machine.

16. A method for diagnosing disease comprising identifying patterns within a biological data set using multiple support vector machines, the method comprising:
- pre-processing a training data set to add meaning to each of a plurality of training data points;
  
  training each of a plurality of support vector machines using the pre-processed training data set, each support vector machine comprising a different kernel;
  
  pre-processing a test data set to add meaning to each of a plurality of test data points;
  
  testing each of the plurality of trained support vector machines using the preprocessed test data set; and
  
  in response to receiving each of the test outputs from each of the plurality of trained support vector machines, comparing each of the test outputs with each other to determine which if any of the test output is an optimal solutions.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The method of claim 16, wherein the disease is cancer.
  - 18. The method of claim 17, wherein the cancer is colon cancer.
  - 19. The method of claim 17, wherein the cancer is breast cancer.
  - 20. The method of claim 16, wherein the knowledge discovered from the test data set comprises genes associated with the disease.

21. A method of treating a disease, comprising administering agents in an effective amount to interfere with or enhance the activity of genes or gene products identified by multiple learning machines, wherein the gene or gene products are identified by:
- training and testing multiple support vector machines using pre-processed training and testing data comprising gene or gene product data known to be relevant to the disease to select a support vector machine that produces a test output comprising an optimal solution;
  
  pre-processing a live data set comprising gene or gene product data relevant to the disease to add meaning to each of a plurality of live data points; and
  
  processing the pre-processed live data set using the selected support vector machine.

22. A diagnostic device, comprising genetic probes that specifically hybridize to genes identified as being associated with a disease by multiple learning machines, wherein the genes are identified by:
- training and testing multiple support vector machines using pre-processed training and testing data comprising gene data known to be relevant to the disease to select a support vector machine that produces a test output comprising an optimal solution;
  
  pre-processing a live data set comprising gene data relevant to the disease to add meaning to each of a plurality of live data points; and
  
  processing the pre-processed live data set using the selected support vector machine.

Specification

Resources

Litigation Campaign Assessment

Litigation Data

Current Assignee
Curtis Anderson, Health Discovery Corporation, James Roberts, Joe Mckenzie, Jules B. Paderewski, Julian N. Stern, Memorial Health Systems Incorporated, Timothy P. O'Hayer
Original Assignee
Barnhill Technologies, LLC
Inventors
Barnhill, Stephen, Guyon, Isabelle, Weston, Jason
Primary Examiner(s)
Davis, George B.
Assistant Examiner(s)
Booker, Kelvin

Application Number

US09/633,616
Time in Patent Office

1,429 Days
Field of Search

706/16, 706/20, 706/924, 706/12, 706/14, 706/45, 706/25
US Class Current

706/16
CPC Class Codes

G06F 18/214   Generating training pattern...

G06F 18/2411   based on the proximity to a...

G06N 20/00   Machine learning

G16B 25/00   ICT specially adapted for h...

G16B 40/00   ICT specially adapted for b...

G16B 40/20   Supervised data analysis

G16B 40/30   Unsupervised data analysis

Enhancing biological knowledge discovery using multiples support vector machines

First Claim

8 Assignments

Litigations

0 Petitions

Accused Products

Abstract

108 Citations

22 Claims

Specification

Solutions

Use Cases

Quick Links

Enhancing biological knowledge discovery using multiples support vector machines

First Claim

8 Assignments

Subscription Required

Subscription Required

Litigations

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

108 Citations

22 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links