Methods for efficiently mining broad data sets for biological markers
First Claim
1. A method for identifying biological markers in a set of n biological measurements for each of p observations, wherein n>
- p and each observation is associated with a clinical endpoint, each biological marker comprising at most k measurements, wherein k<
p, said method comprising;
a) reducing said set of n measurements to a set of m candidate measurements, wherein said reducing comprises performing a differential significance analysis; and
b) selecting at least two biological markers from said set of m candidate measurements, wherein values of each biological marker predict said clinical endpoints.
0 Assignments
0 Petitions
Accused Products
Abstract
A biological marker identification method identifies biological markers within broad sets of biological data containing many more measurements than observations. For example, the data can contain thousands of measurements on each blood sample obtained from fewer than 100 subjects, each of which falls into one of a set of clinical classes or is associated with a value of a continuous clinical response variable. At least one biomarker, containing a small subset of measurements, is found that is capable of predicting a clinical endpoint. The biomarker can be used for, e.g., diagnosing disease or assessing response to a drug. First, the set of measurements is reduced to a smaller set of candidate measurements by eliminating measurements that either cannot distinguish among classes or are redundant. Biomarker subsets are then selected from the remaining set of measurements, either by an exhaustive search or a heuristic method that finds good but not necessary globally optimal biomarkers.
117 Citations
42 Claims
-
1. A method for identifying biological markers in a set of n biological measurements for each of p observations, wherein n>
- p and each observation is associated with a clinical endpoint, each biological marker comprising at most k measurements, wherein k<
p, said method comprising;
a) reducing said set of n measurements to a set of m candidate measurements, wherein said reducing comprises performing a differential significance analysis; and
b) selecting at least two biological markers from said set of m candidate measurements, wherein values of each biological marker predict said clinical endpoints. - View Dependent Claims (12, 19, 23)
- p and each observation is associated with a clinical endpoint, each biological marker comprising at most k measurements, wherein k<
-
2-11. -11. (canceled)
-
13-18. -18. (canceled)
-
20-22. -22. (canceled)
-
24. A method for identifying a biological marker in a set of n biological measurements for each of p observations, wherein n>
- p and each observation is associated with a clinical endpoint, each biological marker comprising at most k measurements, wherein k<
p, said method comprising;
a) reducing said set of n measurements to a set of m candidate measurements; and
b) using simulated annealing, selecting a biological marker from said set of m candidate measurements, wherein values of said biological marker predict said clinical endpoints. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38)
- p and each observation is associated with a clinical endpoint, each biological marker comprising at most k measurements, wherein k<
-
39-40. -40. (canceled)
-
41. A program storage device accessible by a processor, tangibly embodying a program of instructions executable by said processor to perform method steps for a biological marker identification method, wherein said method identifies a biological marker in a set of n biological measurements for each of p observations, wherein n>
- p and each observation is associated with a clinical endpoint, each biological marker comprising at most k measurements, wherein k<
p, said method steps comprising;
a) reducing said set of n measurements to a set of m candidate measurements; and
b) using simulated annealing, selecting a biological marker from said set of m candidate measurements, wherein values of said biological marker predict said clinical endpoints.
- p and each observation is associated with a clinical endpoint, each biological marker comprising at most k measurements, wherein k<
-
42. (canceled)
Specification