Method and system for detecting discriminatory data patterns in multiple sets of data
First Claim
1. A method for detecting discriminatory data patterns in multiple sets of data, each unit of data being described uniquely by one or more coordinates, comprising:
- performing a point-wise test on a training set of data to calculate a plurality of test statistic values for corresponding units of data from two or more groups of data representing known conditions;
determining a threshold test statistic value based on a selected significance level;
selecting those units of data having a test statistic value with an absolute value that exceeds the threshold, the selected units of data comprising candidate marker elements;
selecting a subset of marker elements from the candidate marker elements to discriminate among the two or more groups; and
classifying a testing set of data into the two or more groups using the subset of marker elements.
1 Assignment
0 Petitions
Accused Products
Abstract
A comprehensive analysis procedure for analyzing and comparing multiple sets of data to detect hidden discriminatory data patterns. The inventive procedure identifies a best subset of markers for optimal discrimination between two or more sets of data. A point-wise test on two or more sets of data is performed to calculate test statistic values and to generate a statgram, a two- or higher- dimensional map of the test statistic values along the range of data. A threshold is then determined for isolating critical regions of the statgram at each significance level to provide candidate markers. A subset of markers from the candidate markers is then selected to discriminate among the sets of data. The two or more sets of data are classified using the subset of markers.
49 Citations
38 Claims
-
1. A method for detecting discriminatory data patterns in multiple sets of data, each unit of data being described uniquely by one or more coordinates, comprising:
-
performing a point-wise test on a training set of data to calculate a plurality of test statistic values for corresponding units of data from two or more groups of data representing known conditions;
determining a threshold test statistic value based on a selected significance level;
selecting those units of data having a test statistic value with an absolute value that exceeds the threshold, the selected units of data comprising candidate marker elements;
selecting a subset of marker elements from the candidate marker elements to discriminate among the two or more groups; and
classifying a testing set of data into the two or more groups using the subset of marker elements. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 31, 32, 33, 34, 35, 36, 37, 38)
-
-
13. A method of detecting a disease from data of biological samples, comprising:
-
randomly sampling data that has been standardized and smoothed to divide the data into a training set and a testing set, each set comprising random samples from subjects affected and unaffected by the disease;
performing a point-wise test on the training set of data to determine test statistic values indicative of the difference between corresponding data values of the samples of the affected and the unaffected subjects;
determining a threshold test statistic value based on a selected significance level;
selecting those data values having a test statistic value with an absolute value that exceeds the threshold, the selected units of data comprising candidate marker elements;
selecting a subset of marker elements from the candidate marker elements to discriminate between the affected and the unaffected samples of the training set; and
classifying the testing set of data as representing affected or unaffected samples using the subset of marker elements. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A system for detecting discriminatory data patterns in multiple sets of data, each unit of data being described uniquely by one or more coordinates, comprising:
-
a point-wise test module for performing a point-wise test on a training set of data to calculate a plurality of test statistic values for corresponding units of data from two or more groups of data representing known conditions;
a threshold module for determining a threshold test statistic value based on a selected significance level and selecting those units of data having a test statistic value with an absolute value that exceeds the threshold, the selected units of data comprising candidate marker elements;
a marker selection module for selecting a subset of marker elements from the candidate marker elements to discriminate among the two or more groups; and
a classification module for classifying a testing set of data into the two or more groups using the subset of marker elements. - View Dependent Claims (25, 26, 27)
-
-
28. A system for detecting a disease from data of biological samples, comprising:
-
a sampling module for randomly sampling data that has been standardized and smoothed to divide the data into a training set and a testing set, each set comprising random samples from subjects affected and unaffected by the disease;
a point-wise test module for performing a point-wise test on the training set of data to determine test statistic values indicative of the difference between corresponding data values of the samples of the affected and the unaffected subjects;
a threshold module for determining a threshold test statistic value based on a selected significance level and for selecting those data values having a test statistic value with an absolute value that exceeds the threshold, the selected units of data comprising candidate marker elements;
a marker selection module for selecting a subset of marker elements from the candidate marker elements to discriminate between the affected and the unaffected samples of the training set; and
a classification module for classifying the testing set of data as representing affected or unaffected samples using the subset of marker elements. - View Dependent Claims (29)
-
-
30. A computer readable medium comprising code for detecting discriminatory data patterns in multiple sets of data, each unit of data being described uniquely by one or more coordinates, the code comprising instructions for:
-
performing a point-wise test on a training set of data to calculate a plurality of test statistic values for corresponding units of data from two or more groups of data representing known conditions;
determining a threshold test statistic value based on a selected significance level and selecting those units of data having a test statistic value with an absolute value that exceeds the threshold, the selected units of data comprising candidate marker elements;
selecting a subset of marker elements from the candidate marker elements to discriminate among the two or more groups; and
classifying a testing set of data into the two or more groups using the subset of marker elements.
-
Specification