Selection of features predictive of biological conditions using protein mass spectrographic data

US 7,676,442 B2
Filed: 10/30/2007
Issued: 03/09/2010
Est. Priority Date: 05/01/1998
Status: Expired due to Fees

First Claim

Patent Images

1. A method for identification distinguishing between different biological conditions using protein expression data contained in a plurality of mass spectra generated from mass spectrographic measurement of a plurality of samples from subjects having the different biological conditions, the method comprising:

downloading the plurality of mass spectra into a computer system comprising a processor and a storage device, wherein the processor is programmed to perform the steps of;

aligning the plurality of spectra, comprising;

selecting a first spectrum of the plurality of spectra as a baseline example;

sliding each spectral peak of a second spectrum of the plurality of spectra one at a time along a plurality of peaks within the baseline example;

constructing a similarity measure for comparing pairs of spectra, wherein the similarity measure includes a scoring function for obtaining a similarity score between each spectral peak of the second spectrum and the peaks within the baseline example, the similarity score being examined according to the relationship S(x_i−

x₀)=∥

x_i,−

x₀∥

²₂, where x_iand x₀are feature vectors corresponding to peaks of an i^thspectrum and the baseline spectrum, respectively;

offsetting the second spectrum relative to the baseline example according to the similarity score achieved for the second spectrum;

repeating the step of aligning the spectra for at least one additional spectrum to create a set of aligned spectra;

applying a feature selection algorithm to the set of aligned spectra to select a subset of spectral peaks that discriminate between the different biological conditions, wherein the feature selection algorithm is selected from SVM-recursive feature elimination and l₀-norm minimization; and

training at least one support vector machine to discriminate between the plurality of different sample classes using the selected subset of spectral peaks, wherein the at least one support vector machine comprises a kernel;

processing the plurality of spectra using the at least one support vector machine;

generating a listing for display on a graphical display of at least one predictive feature within the plurality of spectra for distinguishing between the different biological conditions.

View all claims

3 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Support vector machines are used to classify data contained within a structured dataset such as a plurality of signals generated by a spectral analyzer. The signals are pre-processed to ensure alignment of peaks across the spectra. Similarity measures are constructed to provide a basis for comparison of pairs of samples of the signal. A support vector machine is trained to discriminate between different classes of the samples. to identify the most predictive features within the spectra. In a preferred embodiment feature selection is performed to reduce the number of features that must be considered.

47 Citations

View as Search Results

8 Claims

1. A method for identification distinguishing between different biological conditions using protein expression data contained in a plurality of mass spectra generated from mass spectrographic measurement of a plurality of samples from subjects having the different biological conditions, the method comprising:
- downloading the plurality of mass spectra into a computer system comprising a processor and a storage device, wherein the processor is programmed to perform the steps of;
  
  aligning the plurality of spectra, comprising;
  
  selecting a first spectrum of the plurality of spectra as a baseline example;
  
  sliding each spectral peak of a second spectrum of the plurality of spectra one at a time along a plurality of peaks within the baseline example;
  
  constructing a similarity measure for comparing pairs of spectra, wherein the similarity measure includes a scoring function for obtaining a similarity score between each spectral peak of the second spectrum and the peaks within the baseline example, the similarity score being examined according to the relationship S(x_i−
  
  x₀)=∥
  
  x_i,−
  
  x₀∥
  
  ²₂, where x_iand x₀are feature vectors corresponding to peaks of an i^thspectrum and the baseline spectrum, respectively;
  
  offsetting the second spectrum relative to the baseline example according to the similarity score achieved for the second spectrum;
  
  repeating the step of aligning the spectra for at least one additional spectrum to create a set of aligned spectra;
  
  applying a feature selection algorithm to the set of aligned spectra to select a subset of spectral peaks that discriminate between the different biological conditions, wherein the feature selection algorithm is selected from SVM-recursive feature elimination and l₀-norm minimization; and
  
  training at least one support vector machine to discriminate between the plurality of different sample classes using the selected subset of spectral peaks, wherein the at least one support vector machine comprises a kernel;
  
  processing the plurality of spectra using the at least one support vector machine;
  
  generating a listing for display on a graphical display of at least one predictive feature within the plurality of spectra for distinguishing between the different biological conditions.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the plurality of samples are serum samples and the different biological conditions comprise normal, benign and prostate cancer.
  - 3. The method of claim 1, further comprising, before or after the step of aligning, normalizing each of the plurality of spectra.
  - 4. The method of claim 1, further comprising, before or after the step of aligning, smoothing the plurality of spectra by averaging over a window.
  - 5. The method of claim 1, further comprising, before or after the step of aligning, extracting pre-determined peaks from the plurality of spectra.
  - 6. The method of claim 1, wherein the scoring of similarity is made noise invariant by discarding a lower portion of the plurality of spectra.
  - 7. The method of claim 1, wherein the scoring of similarity is made noise invariant by using repeated measurements of the protein samples.
  - 8. The method of claim 1, further comprising:
    - inputting into the processor live spectral data from a live subject suspected of having one of the different biological conditions;
      
      aligning the live spectral data;
      
      using the subset of aligned spectral peaks within the aligned live spectral data, processing the live spectral data using the at least one trained support vector machine to identify which of the different biological conditions the live subject has; and
      
      generating a report to a graphical display device indicating the identified biological condition.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Curtis Anderson, Health Discovery Corporation, James Roberts, Joe Mckenzie, Jules B. Paderewski, Julian N. Stern, Memorial Health Systems Incorporated, Timothy P. O'Hayer
Original Assignee
Health Discovery Corporation
Inventors
Chapelle, Olivier, Weston, Jason Aaron Edward, Elisseeff, Andre, Ben-Hur, Asa
Primary Examiner(s)
Vincent; David R
Assistant Examiner(s)
Wong; Lut

Application Number

US11/929,169
Publication Number

US 20080097940A1
Time in Patent Office

861 Days
Field of Search

706/12, 706/45
US Class Current

706/45
CPC Class Codes

G06F 18/21355   nonlinear criteria, e.g. em...

G06F 18/22   Matching criteria, e.g. pro...

G06F 18/2411   based on the proximity to a...

G06V 10/761   Proximity, similarity or di...

G06V 10/7715   Feature extraction, e.g. by...

Selection of features predictive of biological conditions using protein mass spectrographic data

First Claim

3 Assignments

0 Petitions

Accused Products

Abstract

47 Citations

8 Claims

Specification

Use Cases

Quick Links

Others

Selection of features predictive of biological conditions using protein mass spectrographic data

First Claim

3 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

47 Citations

8 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others