Necessary and sufficient reagent sets for chemogenomic analysis

US 20060035250A1
Filed: 06/10/2005
Published: 02/16/2006
Est. Priority Date: 06/10/2004
Status: Abandoned Application

First Claim

Patent Images

1. A method for determining the necessary set of variables for a classification question, said method comprising:

a. deriving a first linear classifier comprising a first set of variables from a full multivariate dataset, wherein said first linear classifier is capable of answering the classification question with a log odds ratio greater than or equal to a first selected threshold value;

b. removing said first set of variables from the full dataset thereby resulting in a partially depleted dataset;

c. deriving a second linear classifier comprising a second set of variables from the partially depleted dataset, wherein the second linear classifier capable of answering a classification question with a log odds ratio greater than or equal to a second selected threshold value;

d. removing the variables of the second linear classifier from the partially depleted dataset;

e. repeating steps c and d until the second linear classifier generated is not capable of performing with a log odds ratio greater than or equal the first selected threshold value;

wherein the combined set of variables from the derived linear classifiers constitute the necessary set, and the remaining variables in the multivariate dataset constitute the depleted set for answering the classification question.

View all claims

4 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The present invention discloses methods of data analysis directed to diagnostic development, and in particular the development of signatures for classifying chemogenomic data. The invention provides methods for identifying and functionally characterizing a “necessary” set of information rich variables. The invention also discloses methods for identifying a plurality of “sufficient” classifiers. The necessary set of variables may be incorporated into a single diagnostic device to provide simultaneous confirmation of a classification measurement with a plurality of independent classifiers. In the field of biological diagnostics, the invention may be used to provide a plurality of short lists of genes, referred to as “signatures” that are “sufficient” to carry out specific classification tasks such as predicting the activity and side effects of a compound in vivo.

Citations

29 Claims

1. A method for determining the necessary set of variables for a classification question, said method comprising:
- a. deriving a first linear classifier comprising a first set of variables from a full multivariate dataset, wherein said first linear classifier is capable of answering the classification question with a log odds ratio greater than or equal to a first selected threshold value;
  
  b. removing said first set of variables from the full dataset thereby resulting in a partially depleted dataset;
  
  c. deriving a second linear classifier comprising a second set of variables from the partially depleted dataset, wherein the second linear classifier capable of answering a classification question with a log odds ratio greater than or equal to a second selected threshold value;
  
  d. removing the variables of the second linear classifier from the partially depleted dataset;
  
  e. repeating steps c and d until the second linear classifier generated is not capable of performing with a log odds ratio greater than or equal the first selected threshold value;
  
  wherein the combined set of variables from the derived linear classifiers constitute the necessary set, and the remaining variables in the multivariate dataset constitute the depleted set for answering the classification question.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 20)
- - 2. The method of claim 1, further comprising:
    - g. repeating steps c and d until the second linear classifier generated is not capable of performing with a log odds ratio greater than or equal to a second selected threshold value.
  - 3. The method of claim 2, wherein the first and second selected threshold values are equal.
  - 4. The method of claim 2, wherein the second selected threshold value is less than the first selected threshold value.
  - 5. The method of claim 1, wherein the linear classifiers are generated with an algorithm selected from the group consisting of SPLP, SPLR and SPMPM.
  - 6. The method of claim 1, wherein the multivariate dataset comprises data from polynucleotide array experiments.
  - 7. The method of claim 6, wherein the polynucleotide array experiment comprises compound-treated samples.
  - 8. A set of necessary variables for answering a classification question made according to claim 1.
  - 9. The set of variables of claim 8 wherein the variables are genes.
  - 10. The set of variables of claim 9 wherein the number of genes is 400 or fewer.
  - 11. The set of variables of claim 9 wherein the number of genes is 100 or fewer.
  - 12. An array comprising a set of polynucleotides each representing a gene in the necessary set of claim 8.
  - 13. An array comprising a set of receptors each capable of binding a protein encoded by a gene in the necessary set of claim 8.
  - 14. A subset of genes useful for answering a chemogenomic classification question comprising a percentage of genes randomly selected from a necessary set made according to claim 1, wherein the addition of the genes to the depleted set for the classification question increases the average logodds ratio of the linear classifiers generated by the depleted set.
  - 15. The subset of claim 14, wherein the classification question is selected from those listed in Table 2.
  - 16. The subset of claim 14, wherein the classification question is monoamine re-uptake (SERT) inhibitor and the necessary set consists of the 311 genes listed in Table 5.
  - 17. The subset of claim 16, wherein the randomly selected percentage of genes from the necessary set is 15% and the average logodds ratio is increased to greater than or equal to 3.0.
  - 18. The subset of claim 16, wherein the randomly selected percentage of genes from the necessary set is 26% and the threshold average logodds ratio is to greater than or equal to 4.0.
  - 20. The method of claim 1, further comprising:
    - after step d repeating the steps of (i) deriving a linear classifier; and
      
      (ii) removing each additional linear classifier'"'"'s set of genes from the partially depleted dataset;
      
      until the partially depleted dataset is not capable of generating a linear classifier with a log odds ratio greater than or equal to the second selected threshold value.

19. A method for preparing a reagent set comprising:
- a. deriving a first linear classifier comprising a first set of genes from a full dataset, wherein said first linear classifier is capable of answering a classification question with a log odds ratio greater than or equal to a first selected threshold value;
  
  b. removing said first set of genes from the full dataset thereby resulting in a partially depleted chemogenomic dataset;
  
  c. deriving a second linear classifier comprising a second set of genes from the partially depleted dataset, wherein the second linear classifier capable of answering a classification question with a log odds ratio greater than or equal to a second selected threshold value;
  
  d. removing said second set of genes from the partially depleted dataset;
  
  e. preparing a plurality of isolated polynucleotides or polypeptides, wherein each polynucleotide or polypeptide is capable of detecting at least one gene of said first and second sets genes.

21. A reagent set for answering a classification question comprising a set of polynucleotides or polypeptides representing a plurality of genes, wherein the addition of a random selection of at least 10% of said plurality of genes to the depleted set for the classification question increases the average logodds ratio of the linear classifiers generated by the depleted set by at least 20%.
- View Dependent Claims (22, 23, 24, 25, 26, 27, 28)
- - 22. The reagent set of claim 21, wherein the random selection is of at least 25% of said plurality of genes and the average logodds ratio of the linear classifiers generated by the depleted set by at least 50%.
  - 23. The reagent set of claim 21, wherein the classification question relates to the effect of an in vivo compound treatment on gene expression.
  - 24. The reagent set of claim 21, wherein the classification question is selected from those listed in Table 2.
  - 25. The reagent set of claim 21, wherein the number of genes is 400 or fewer.
  - 26. The reagent set of claim 21, wherein the number of genes is 200 or fewer.
  - 27. An array comprising a set of polynucleotides capable of specifically binding to the reagent set of claim 21.
  - 28. A diagnostic device comprising the reagent set of claim 21.

29. A method of classifying experimental data comprising:
- a. providing at least two non-overlapping sufficient sets of variables useful for answering a classification question;
  
  b. querying the experimental data with one of the at least two non-overlapping sufficient sets of variables;
  
  c. querying the experimental data with another of the at least two non-overlapping sufficient sets of variables;
  
  wherein the classification of the data is determined based on the answers to the queries generated by the at least two non-overlapping sets of variables.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
US Department of Health and Human Services (Government of the United States of America)
Original Assignee
US Department of Health and Human Services (Government of the United States of America)
Inventors
Natsoulis, Georges

Application Number

US11/149,612
Publication Number

US 20060035250A1
Time in Patent Office

Days
Field of Search
US Class Current

435/6
CPC Class Codes

C12Q 1/6876   Nucleic acid products used ...

C12Q 2600/136   Screening for pharmacologic...

C12Q 2600/158   Expression markers

G16B 25/00   ICT specially adapted for h...

G16B 25/10   Gene or protein expression ...

G16B 40/00   ICT specially adapted for b...

G16B 40/10   Signal processing, e.g. fro...

Necessary and sufficient reagent sets for chemogenomic analysis

First Claim

4 Assignments

0 Petitions

Accused Products

Abstract

Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Necessary and sufficient reagent sets for chemogenomic analysis

First Claim

4 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links