Data analysis methods for locating entities of interest within large, multivariable datasets
First Claim
1. A method for locating a subset within an experimental set of biological data points that is of most interest for further analysis, the method comprising:
 a) obtaining a set of data points associated with a biological phenomenon of interest, wherein the set of data points comprises a baseline and one or more experimental groups;
b) designating a trend of interest in the set of data points that is associated with the biological phenomenon of interest, wherein the trend indicates a relationship between the data points and an independent variable,c) developing a mathematical model of the trend, the mathematical model being a function of the independent variable and wherein the mathematical model models the trend with respect to the independent variable;
d) testing each data point in the set for adherence to the mathematical model, wherein the data points adhering to the model are identified as being members of the subset of most interest for further analysis; and
e) providing identification of the members of the subset in a userreadable format,wherein all of the steps b), c), d), and e) are performed on a suitablyprogrammed computer.
9 Assignments
Litigations
0 Petitions
Accused Products
Abstract
The present invention provides data analysis methods for the rapid location of subsets of large, multivariable biological datasets that are of most interest for further analysis, for the investigation of molecular modes of action of biological phenomena of interest, and for the identification of sets of data points that best distinguish between experimental groups in larger datasets as putative biomarkers. While existing methods for analyzing large biological datasets generally provide too much information to the user, or not enough, the methods of the present invention entail taking user input on what kinds of trends are of interest and then finding results that match the designated trend. In such manner, the methods of the invention allow a user to quickly pinpoint the subset of data of most interest without a concomitant loss of a large percentage of relevant information, as is typical with standard methods. The methods of the invention allow for identification of molecular entities that are involved in a biological phenomenon of interest, entities that may have otherwise gone undiscovered in a large, multivariable dataset.
6 Citations
17 Claims

1. A method for locating a subset within an experimental set of biological data points that is of most interest for further analysis, the method comprising:

a) obtaining a set of data points associated with a biological phenomenon of interest, wherein the set of data points comprises a baseline and one or more experimental groups; b) designating a trend of interest in the set of data points that is associated with the biological phenomenon of interest, wherein the trend indicates a relationship between the data points and an independent variable, c) developing a mathematical model of the trend, the mathematical model being a function of the independent variable and wherein the mathematical model models the trend with respect to the independent variable; d) testing each data point in the set for adherence to the mathematical model, wherein the data points adhering to the model are identified as being members of the subset of most interest for further analysis; and e) providing identification of the members of the subset in a userreadable format, wherein all of the steps b), c), d), and e) are performed on a suitablyprogrammed computer.  View Dependent Claims (2, 3, 4, 5, 6, 12, 13, 14)


7. A method for investigating the molecular mode of action of a biological phenomenon of interest, the method comprising:

a) obtaining a set of data points associated with a biological phenomenon of interest using biochemical profiling, gene expression profiling, or protein expression profiling, wherein the set of data points comprises a baseline and one or more experimental groups; b) designating a trend of interest in the set of data points that is associated with the biological phenomenon of interest, wherein the trend indicates a relationship between the data points and an independent variable; c) developing a mathematical model of the trend, the mathematical model being a function of the independent variable and wherein the mathematical model models the trend with respect to the independent variable; d) testing each data point in the set for adherence to the mathematical model; e) identifying, from a plurality of possible metabolic pathways, one or more metabolic pathways to which the data points that adhere to the model belong, wherein the mode of action of the phenomenon of interest affects the identified metabolic pathways; and f) providing identification of the identified metabolic pathways in a userreadable format, wherein all of the steps b), c), d), e), and f) are performed on a suitablyprogrammed computer.  View Dependent Claims (8, 9, 10, 11, 15, 16, 17)

1 Specification