Method for detecting discriminatory data patterns in multiple sets of data and diagnosing disease
First Claim
1. A method for analyzing biological samples for the classification of multiple sets of mass spectrometer data derived from the biological samples, each set of mass spectrometer data comprising a plurality of candidate markers and each candidate marker being described uniquely by one or more coordinates, the method comprising:
- for at least a portion of the biological samples, perform the steps of;
ionizing the biological sample to produce ions; and
detecting the ions by use of a mass spectrometer, to produce mass spectrometer data;
the method further comprising the steps of;
selecting a training set of mass spectrometer data, the training set of mass spectrometer data comprising a plurality of sets of mass spectrometer data from two or more groups of mass spectrometer data representing known conditions selected from the multiple sets of mass spectrometer data;
performing a point-wise test on the training set of mass spectrometer data to calculate a plurality of test statistic values for the candidate markers;
determining a threshold test statistic value using a multiple-test correction method based on the size of the training set of mass spectrometer data and a selected significance level;
selecting those candidate markers having a test statistic value with an absolute value that exceeds the threshold;
selecting a subset of markers from the candidate markers using a best k-subset discriminant method to discriminate among the two or more groups; and
classifying a testing set of mass spectrometer data comprising a plurality of sets of mass spectrometer data into the two or more groups using the subset of markers; and
outputting a result of the classifying step to a computer readable medium,wherein at least one candidate marker has a test statistic value with an absolute value that exceeds the threshold test statistic value.
1 Assignment
0 Petitions
Accused Products
Abstract
A comprehensive analysis procedure for analyzing and comparing multiple sets of data to detect hidden discriminatory data patterns. The inventive procedure identifies a best subset of markers for optimal discrimination between two or more sets of data. A point-wise test on two or more sets of data is performed to calculate test statistic values and to generate a statgram, a two- or higher- dimensional map of the test statistic values along the range of data. A threshold is then determined for isolating critical regions of the statgram at each significance level to provide candidate markers. A subset of markers from the candidate markers is then selected to discriminate among the sets of data. The two or more sets of data are classified using the subset of markers.
12 Citations
26 Claims
-
1. A method for analyzing biological samples for the classification of multiple sets of mass spectrometer data derived from the biological samples, each set of mass spectrometer data comprising a plurality of candidate markers and each candidate marker being described uniquely by one or more coordinates, the method comprising:
-
for at least a portion of the biological samples, perform the steps of; ionizing the biological sample to produce ions; and detecting the ions by use of a mass spectrometer, to produce mass spectrometer data; the method further comprising the steps of; selecting a training set of mass spectrometer data, the training set of mass spectrometer data comprising a plurality of sets of mass spectrometer data from two or more groups of mass spectrometer data representing known conditions selected from the multiple sets of mass spectrometer data; performing a point-wise test on the training set of mass spectrometer data to calculate a plurality of test statistic values for the candidate markers; determining a threshold test statistic value using a multiple-test correction method based on the size of the training set of mass spectrometer data and a selected significance level; selecting those candidate markers having a test statistic value with an absolute value that exceeds the threshold; selecting a subset of markers from the candidate markers using a best k-subset discriminant method to discriminate among the two or more groups; and classifying a testing set of mass spectrometer data comprising a plurality of sets of mass spectrometer data into the two or more groups using the subset of markers; and outputting a result of the classifying step to a computer readable medium, wherein at least one candidate marker has a test statistic value with an absolute value that exceeds the threshold test statistic value. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 23, 24, 25)
-
-
12. A method of detecting cancer from mass spectrometer data of biological samples, comprising:
-
for at least a portion of the biological samples, perform the steps of; ionizing the biological sample to produce ions; and detecting the ions by use of a mass spectrometer, to produce mass spectrometer data; the method further comprising the steps of; normalizing and smoothing the mass spectrometer data, to produce standardized and smoothed data; randomly sampling data that has been standardized and smoothed to divide the standardized and smoothed data into a training set of mass spectrometer data and a testing set of mass spectrometer data, each of the training set and the testing set comprising random samples from subjects affected and unaffected by the disease; performing a point-wise test on the training set of mass spectrometer data to determine test statistic values indicative of the difference between corresponding mass spectrometer data values of the samples of the affected and the unaffected subjects; determining a threshold test statistic value using a multiple-test correction method based on the size of the training set of mass spectrometer data and a selected significance level; selecting candidate markers having mass spectrometer data values, the mass spectrometer data values having a test statistic value with an absolute value that exceeds the threshold; selecting a subset of markers from the candidate markers using a best k-subset discriminant method to discriminate between the affected and the unaffected samples of the training set; and classifying the testing set of mass spectrometer data as representing affected or unaffected samples using the subset of markers; and outputting the result of the classifying step to a computer readable medium, wherein at least one candidate marker has a test statistic value with an absolute value that exceeds the threshold test statistic value. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 26)
-
Specification