Defining biological states and related genes, proteins and patterns
First Claim
1. A method for use in the analysis of gene or protein expression information comprising, (a) accessing gene or protein expression data comprising expression levels of G genes or proteins in S samples, where the S samples may be classified into C classes representing cellular states;
- (b) determining a measure of the variability of expression levels of each gene or protein in the data as a whole; and
(c) determining a measure of the variability of expression levels of each gene or protein within each class of sample.
2 Assignments
0 Petitions
Accused Products
Abstract
Disclosed are a variety of methods and computer systems for use in the analysis of gene and protein expression data. Also disclosed are methods for the definition of the cellular state of cells and tissues from multidimensional physiological data such as those obtained from gene expression measurements with DNA microarrays. A variety of classification methods can be applied to expression data to achieve this goal. Demonstrated is the application of several statistical tools including Wilks'"'"' lambda ratio of within-group to total variance, Fisher Discriminant Analysis, and the misclassification error rate to the identification of discriminating genes and the overall classification of expression data. Examples from several different cases demonstrate the ability of the method to produce well-separated groups in the projection space representing distinct physiological states. The method can be augmented and is useful in disease diagnosis, drug screening and bioprocessing applications.
-
Citations
73 Claims
-
1. A method for use in the analysis of gene or protein expression information comprising,
(a) accessing gene or protein expression data comprising expression levels of G genes or proteins in S samples, where the S samples may be classified into C classes representing cellular states; -
(b) determining a measure of the variability of expression levels of each gene or protein in the data as a whole; and
(c) determining a measure of the variability of expression levels of each gene or protein within each class of sample. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 37, 38, 39, 40, 41, 42, 43)
-
-
36. A method for identifying a gene or protein, the expression of which is related to a cellular state or a change in cellular state comprising,
(a) accessing gene or protein expression data comprising expression levels of G genes or proteins in S samples, where the S samples may be classified into C classes representing cellular states; -
(b) determining a measure of the variability of expression levels of each gene or protein in the data as a whole;
(c) determining a measure of the variability of expression levels of each gene or protein within each class of sample; and
(d) identifying a gene or protein which is related to a cellular state or a change in cellular state by identifying a gene or protein for which the measure of variability determined in (c) is less than the measure of variability determined in (b) with a 90% degree of confidence.
-
-
44. A method for identifying a gene or protein expression pattern that is useful for discriminating between samples of two or more cellular states comprising,
(a) accessing gene or protein expression data comprising expression levels of G genes or proteins in S samples, where the S samples may be classified into C classes representing cellular states; -
(b) determining a measure of the variability of expression levels of each gene or protein in the data as a whole;
(c) determining a measure of the variability of expression levels of each gene or protein within each class of sample;
(d) generating for each gene or protein a comparison of the measure of variability determined in (b) to the measures of variability determined in (c); and
(e) selecting from among the genes or proteins of (d) a set of genes or proteins and corresponding expression levels that discriminate between two or more classes of sample with a misclassification rate less than 40%, wherein the set of genes or proteins and corresponding expression levels is a pattern that is useful for discriminating between samples of two or more cellular states. - View Dependent Claims (45, 47, 50, 51, 52, 53)
-
-
46. A computer product for use in analyzing gene or protein expression data, the product disposed on a computer readable medium, and comprising instructions for causing a processor to:
-
(a) determine a measure of the variability of expression levels of a gene or protein in gene or protein expression data comprising expression levels of G genes or proteins in S samples, where the S samples may be classified into C classes representing cellular states;
(b) determining a measure of the variability of expression levels of the gene or protein within each class of sample in the data.
-
-
48. A system comprising a processor and instructions for causing a processor to:
-
(a) determine a measure of the variability of expression levels of each gene or protein in gene or protein expression data comprising expression levels of G genes or proteins in S samples, where the S samples may be classified into C classes representing cellular states;
(b) determining a measure of the variability of expression levels of each gene or protein within each class of sample in the data.
-
-
49. A method for use in modifying the production of a metabolite in a cell comprising:
-
(a) accessing data comprising a representation of the expression levels of G genes or proteins in S samples, wherein the S samples may be classified into C classes representing biological states, and wherein at least two of the biological states differ in the level of the metabolite that is produced; and
(b) identifying a discriminating gene or protein, the expression levels of which are discriminatory in defining a biological state of higher metabolite production from a biological state of lower metabolite production.
-
-
54. A method for use in modifying the production of a polyhydroxyalkanoate in a cell comprising altering the genetic makeup of the cell so as to cause the cell to have a modified expression of a gene represented by an index number selected from the group consisting of:
- s110008, s110010, s110039, s110322, s110361, s110373, s110374, s110379, s110385, s110396, s110459, s110469, s110477, s110486, s110550, s110558, s110703, s110873, s111317, s111376, s111473, s111504, s111514, s111611, s111623, s111630, s111632, s111702, s111820 and s1r1822, or an orthologue of any of the preceding.
- View Dependent Claims (55, 56, 57, 58, 60, 61, 62)
-
59. A bacterium comprising a recombinant nucleic acid construct comprising a coding sequence of a gene represented by an index number selected from the group consisting of:
- s110008, s110010, s110039, s110322, s110361, s110373, s110374, s110379, s110385, s110396, s110459, s110469, s110477, s110486, s110550, s110558, s110703, s110873, s111317, s111376, s111473, s111504, s111514, s111611, s111623, s111630, s111632, s111702, s111820 and s1r1822, or an orthologue of any of the preceding.
-
63. A method for determining whether a sample contains a hyperproliferative cell comprising:
-
a) determining a level of gene expression of at least one gene in a sample, wherein the at least one gene is selected from the group consisting of Neuromedin U;
Aldehyde dehydrogenase 9 (Human gamma-aminobutyraldehyde dehydrogenase E3 isozyme);
Fibroblast growth factor 8;
Human epidermal growth factor receptor (HER3);
Translocase of outer mitochondrial membrane 34;
KIAA0089;
Monoamine oxidase B;
Zinc finger protein 273;
clone 1D2;
Aldehyde dehydrogenase 10 (fatty aldehyde dehydrogenase);
Carboxylesterase 2 (intestine, liver);
Gro2 oncogene;
Diazepam binding inhibitor;
Cadherin 17;
TAL1 (SCL) interrupting locus;
Crystallin alpha B;
5T4 oncofetal trophoblast glycoprotein;
Deoxyribonuclease I-like 3;
Heat-shock protein 90-kDa;
Smg GDS-associated protein;
Cytochrome c oxidase subunit Vb (coxVb);
Wilm Tumor-Related Protein;
TYRO3 protein tyrosine kinase;
FAT tumor suppressor;
Creatine kinase, mitochondrial 1;
Transcription factor 20;
MHC class I polypeptide related sequence A;
KIAA0018 gene product 1;
Lectin galactoside-binding, soluble, 7 (galectin
7);
Tenascin-R (restrictin, janusin);
CD1A antigen, a polypeptide;
Beta-Hexosaminidase, Alpha Polypeptide, Abnormal Splice Mutation;
clone 1A7;
KIAA0172 gene;
Myxovirus (influenza) resistance 2, homolog of murine;
Lysophospholipase like;
Interleukin-8 receptor type B, splice variant IL8RB9;
keratin 4; and
Runt-related transcription factor, and wherein the level of gene expression of the at least one gene allows classification of an oral keratinocyte as hyperproliferative or non-hyperproliferative with a misclassification rate of 40% or lower;
b) comparing the level of gene expression of said at least one gene to a first control level of gene expression of said at least one gene as measured in a hyperproliferative cell; and
c) comparing the level of gene expression of the at least one gene to a second control level of gene expression of said at least one gene as measured in a non-hyperproliferative cell;
wherein a sample contains a hyperproliferative cell if the level of gene expression of the at least one gene is more mathematically similar to the first control level of gene expression than to the second control level of gene expression. - View Dependent Claims (64, 65, 66, 67, 70, 71)
-
-
68. A method for determining whether a sample contains a hyperproliferative, cell comprising:
-
a) determining a level of gene expression of at least two genes in a sample, wherein said at least two genes are selected from the group consisting of Neuromedin U;
Aldehyde dehydrogenase 9 (Human gamma-aminobutyraldehyde dehydrogenase E3 isozyme);
Fibroblast growth factor 8;
Human epidermal growth factor receptor (HER3);
Translocase of outer mitochondrial membrane 34;
KIAA0089;
Monoamine oxidase B;
Urokinase plasminogen activator;
Zinc finger protein 273;
clone 1D2;
Aldehyde dehydrogenase 10 (fatty aldehyde dehydrogenase);
Carboxylesterase 2 (intestine, liver);
Gro2 oncogene;
Diazepam binding inhibitor;
Cadherin 17;
TAL1 (SCL) interrupting locus;
Crystallin alpha B;
5T4 oncofetal trophoblast glycoprotein;
Deoxyribonuclease I-like 3;
Heat-shock protein 90-kDa;
Smg GDS-associated protein;
Cytochrome c oxidase subunit Vb (coxVb);
Wilm Tumor-Related Protein;
TYRO3 protein tyrosine kinase;
FAT tumor suppressor;
Creatine kinase, mitochondrial 1;
Ferritin, light polypeptide;
Transcription factor 20;
MHC class I polypeptide related sequence A;
KIAA0018 gene product 1;
Lectin galactoside-binding, soluble, 7 (galectin
7);
Tenascin-R (restrictin, janusin);
CD1A antigen, a polypeptide;
Cytochrome P4502C9 subfamily IIC (mephytoin4-hydroxylase), polypeptide 9;
Phospholipase A2, group VII;
Beta-Hexosaminidase, Alpha Polypeptide, Abnormal Splice Mutation;
clone 1A7;
KIAA0172 gene;
Interleukin 8 receptor, beta;
Myxovirus (influenza) resistance 2, homolog of murine;
Lysophospholipase like;
Interleukin-8 receptor type B, splice variant IL8RB9;
keratin 4;
Runt-related transcription factor; and
Cathepsin L; and
wherein the level of gene expression of said at least two genes allows classification of an oral keratinocyte as hyperproliferative or non-hyperproliferative with a misclassification rate of 40% (30%, 20%, 15%, 10%) or lower;
b) comparing the level of gene expression of said at least two genes to a first control level of gene expression of said at least two genes as measured in a hyperproliferative cell; and
c) comparing the level of gene expression of the at least two genes to a second control level of gene expression of said at least two genes as measured in a non-hyperproliferative cell;
wherein a sample contains a hyperproliferative cell if the level of gene expression of the at least one gene is more mathematically similar to the first control level of gene expression than to the second control level of gene expression.
-
-
69. A method for classifying a leukemia sample comprising:
-
a) determining a level of gene expression of at least one gene in a sample, wherein said at least one gene is selected from the group consisting of U05259, M89957, M84371, D88270, X58529, M28170, M31523, M11722, J03473, X03934, U23852, X00437, M23323, X59871, X76223, D00749, L05148, U14603, M37271, M26692, M12886, J05243, X69398, U67171, X04145, L10373, U16954, J04132, M28826, HG4128, X87241, U50743, M13792, L47738, X95735, X17042, M23197, M84526, L09209, U46499, M27891, M16038, M63138, M55150, M22960, M62762, X61587, and U50136, and wherein the level of gene expression of said at least one gene allows classification of a leukemia as AML, B-ALL or T-ALL with a misclassification rate of 40% or lower;
b) comparing the level of gene expression of said at least one gene to a first control level of gene expression of said at least one gene as measured in an AML cell;
c) comparing the level of gene expression of said at least one gene to a second control level of gene expression of said at least one gene as measured in a B-ALL cell; and
d) comparing the level of gene expression of said at least one gene to a third level of gene expression of said at least one gene as measured in a T-ALL cell;
wherein the leukemia is classified as AML, B-ALL or T-ALL depending on whether the level of gene expression of the at least one gene is more mathematically similar to the first control level of gene expression;
the second control level of gene expression;
or the third control level of gene expression.
-
-
72. A method for classifying a leukemia sample comprising:
-
a) determining a level of gene expression of at least one gene in a sample, wherein said at least one gene is selected from the group consisting of M89957, M84371, D88270, X58529, M28170, M11722, J03473, X03934, U23852, X00437, M23323, X59871, X76223, D00749, LOS 148, U14603, M37271, M26692, M12886, J05243, X69398, U67171, X04145, L10373, U16954, J04132, M28826, HG4128, X87241, U50743, L09209, U46499, M22960, and X61587, and wherein the level of gene expression of said at least one gene allows classification of a leukemia as AML, or ALL with a misclassification rate of 40% or lower;
b) comparing the level of gene expression of said at least one gene to a first control level of gene expression of said at least one gene as measured in an AML cell;
c) comparing the level of gene expression of said at least one gene to a second control level of gene expression of said at least one gene as measured in an ALL cell; and
wherein the leukemia is classified as AML or ALL depending on whether the level of gene expression of the at least one gene is more mathematically similar to the first control level of gene expression or the second control level of gene expression.
-
-
73. A method for identifying a candidate therapeutic agent for the treatment of a hyperproliferative disorder comprising:
-
(a) contacting a hyperproliferative cell with a test therapeutic agent;
(b) determining a level of gene expression of a gene in the cell, wherein said gene is selected from the group consisting of Neuromedin U;
Aldehyde dehydrogenase 9 (Human gamma-aminobutyraldehyde dehydrogenase E3 isozyme);
Fibroblast growth factor 8;
Human epidermal growth factor receptor (HER3);
Translocase of outer mitochondrial membrane 34;
KIAA0089;
Monoamine oxidase B;
Urokinase plasminogen activator;
Zinc finger protein 273;
clone 1D2;
Aldehyde dehydrogenase 10 (fatty aldehyde dehydrogenase);
Carboxylesterase 2 (intestine, liver);
Gro2 oncogene;
Diazepam binding inhibitor;
Cadherin 17;
TAL1 (SCL) interrupting locus;
Crystallin alpha B;
5T4 oncofetal trophoblast glycoprotein;
Deoxyribonuclease I-like 3;
Heat-shock protein 90-kDa;
Smg GDS-associated protein;
Cytochrome c oxidase subunit Vb (coxVb);
Wilm Tumor-Related Protein;
TYRO3 protein tyrosine kinase;
FAT tumor suppressor;
Creatine kinase, mitochondrial 1;
Ferritin, light polypeptide;
Transcription factor 20;
MHC class I polypeptide related sequence A;
KIAA0018 gene product 1;
Lectin galactoside-binding, soluble, 7 (galectin
7);
Tenascin-R (restrictin, janusin);
CD 1A antigen, a polypeptide;
Cytochrome P4502C9 subfamily IIC (mephytoin4-hydroxylase), polypeptide 9;
Phospholipase A2, group VII;
Beta-Hexosaminidase, Alpha Polypeptide, Abnormal Splice Mutation;
clone 1A7;
KIAA0172 gene;
Interleukin 8 receptor, beta;
Myxovirus (influenza) resistance 2, homolog of murine;
Lysophospholipase like;
Interleukin-8 receptor type B, splice variant IL8RB9;
keratin 4;
Runt-related transcription factor; and
Cathepsin L; and
wherein the level of gene expression of said gene allows classification of an oral keratinocyte as hyperproliferative or non-hyperproliferative with a misclassification rate of 40% or lower; and
(c) determining whether the expression level of said gene is more mathematically similar to that of a proliferative cell or a non-hyperproliferative cell, wherein a test therapeutic agent that causes the expression level of the gene in the hyperproliferative cell to more closely resemble the expression level of the gene in a non-hyperproliferative cell is a candidate therapeutic agent.
-
Specification