GENOMIC CLASSIFICATION OF COLORECTAL CANCER BASED ON PATTERNS OF GENE COPY NUMBER ALTERATIONS

US 20100145894A1
Filed: 10/28/2009
Published: 06/10/2010
Est. Priority Date: 10/31/2008
Status: Active Grant

First Claim

Patent Images

1. A method for obtaining a database of colorectal cancer genomic subgroups, the method comprising the steps of:

(a) obtaining a plurality of m samples comprising at least one CRC cell, wherein the samples comprise cell lines or tumors;

(b) acquiring a data set comprising copy number alteration information from at least one locus from each chromosome from each sample obtained in step (a);

(c) identifying in the data set samples contaminated by normal cells and eliminating the contaminated samples from the data set, wherein the identifying and eliminating comprises;

(1) applying a machine learning algorithm tuned to parameters that represent the differences between tumor and normal samples to the data;

(2) assigning a probability score for normal cell contamination to each sample as determined by the machine learning algorithm;

(3) eliminating data from the data set for each sample scoring 50% or greater probability of containing normal cells;

(d) estimating a number of subgroups, r, in the data set by applying an unsupervised clustering algorithm using Pearson linear dissimilarity algorithm to the data set;

(e) assigning each sample in the data set to at least one cluster using a modified genomic Non-negative Matrix Factorization (gNMF) algorithm, wherein the modified gNMF algorithm comprises;

(1) calculating divergence of the algorithm after every 100 steps of multiplicative updating using formula (11);

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention is directed to methods and kits that allow for classification of colorectal cancer cells according to genomic profiles, and methods of diagnosing, predicting clinical outcomes, and stratifying patient populations for clinical testing and treatment using the same.

38 Citations

View as Search Results

23 Claims

1. A method for obtaining a database of colorectal cancer genomic subgroups, the method comprising the steps of:
- (a) obtaining a plurality of m samples comprising at least one CRC cell, wherein the samples comprise cell lines or tumors;
  
  (b) acquiring a data set comprising copy number alteration information from at least one locus from each chromosome from each sample obtained in step (a);
  
  (c) identifying in the data set samples contaminated by normal cells and eliminating the contaminated samples from the data set, wherein the identifying and eliminating comprises;
  
  (1) applying a machine learning algorithm tuned to parameters that represent the differences between tumor and normal samples to the data;
  
  (2) assigning a probability score for normal cell contamination to each sample as determined by the machine learning algorithm;
  
  (3) eliminating data from the data set for each sample scoring 50% or greater probability of containing normal cells;
  
  (d) estimating a number of subgroups, r, in the data set by applying an unsupervised clustering algorithm using Pearson linear dissimilarity algorithm to the data set;
  
  (e) assigning each sample in the data set to at least one cluster using a modified genomic Non-negative Matrix Factorization (gNMF) algorithm, wherein the modified gNMF algorithm comprises;
  
  (1) calculating divergence of the algorithm after every 100 steps of multiplicative updating using formula (11);
- View Dependent Claims (3, 4, 5, 6, 7, 8)
- - 3. The method of claim 1 or 2, wherein the unsupervised clustering algorithm is a hierarchical clustering.
  - 4. The method of claim 1 or 2, wherein Cophenetic correlation is used to provide a final number of clusters from the data set.
  - 5. The method of claim 1 or 2, wherein Bayesian information criterion is used to provide a final number of clusters from the data set.
  - 6. The method of claim 1 or 2, wherein Cophenetic correlation and Bayesian information criterion are used to provide a final number of clusters from the data set.
  - 7. The method of claim 1 or 2, wherein the plurality of samples, m, comprises a first, second, and third cell line, whereinthe first cell line is selected from the group consisting of HCT-8, LS 174T, SK-CO-1, SW48, DLD-1, HCT-15, HCT116, LoVo, CL-34, CL-40, C170, and LS180;
    - the second cell line is selected from the group consisting of Caco-2, LS1034, LS411N, LS513, NCI-H498, NCI-H747, SW1116, SW1417, SW837, HT-29, SW620, CL-11, CL-14, Colo-678, and SW-480; and
      
      the third cell line is selected from the group consisting of Colo 320DM, NCI-H508, NCI-H716, SW1463, SW403, SW948, Colo 205, and Colo-206F.
  - 8. The method of claim 1 or 2, wherein the plurality of samples, m, consists of HCT-8, LS 174T, SK-CO-1, SW48, DLD-1, HCT-15, HCT116, LoVo, CL-34, CL-40, C170, LS180, Caco-2, LS1034, LS411N, LS513, NCI-H498, NCI-H747, SW1116, SW1417, SW837, HT-29, SW620, CL-11, CL-14, Colo-678, SW-480, Colo 320DM, NCI-H508, NCI-H716, SW1463, SW403, SW948, Colo 205, and Colo-206F cell lines.

2. A method of classifying a CRC tumor or cell line, comprising:
- (a) providing a database, developed through a method comprising;
  
  (i) obtaining a plurality of m samples comprising at least one CRC tumor or cell line;
  
  (ii) acquiring a first data set comprising copy number alteration information from at least one locus from each chromosome from each sample obtained in step (i);
  
  (iii) identifying in the first data set samples contaminated by normal cells and eliminating the contaminated samples from the first data set, wherein the identifying and eliminating comprises;
  
  (1) applying a machine learning algorithm tuned to parameters that represent the differences between tumor and normal samples to the data;
  
  (2) assigning a probability score for normal cell contamination to each sample as determined by the machine learning algorithm;
  
  (3) eliminating data from the first data set for each sample scoring 50% or greater probability of containing normal cells;
  
  (iv) estimating a number of subgroups, r, in the data set by applying an unsupervised clustering algorithm using Pearson linear dissimilarity algorithm to the data set;
  
  (v) assigning each sample in the data set to at least one cluster using a modified genomic Non-negative Matrix Factorization (gNMF) algorithm, wherein the modified gNMF algorithm comprises;
  
  (1) calculating divergence of the algorithm after every 100 steps of multiplicative updating using formula (11);

9. A method of classifying a therapeutic intervention for arresting or killing colorectal cancer (CRC) cells, comprising:
- (a) from a panel of CRC cells classified according to genomic subgroups, selecting at least one CRC cell line from each subgroup, wherein the panel is assembled from a method comprising;
  
  (i) obtaining a plurality of m samples comprising at least one CRC tumor or cell line;
  
  (ii) acquiring a first data set comprising copy number alteration information from at least one locus from each chromosome from each sample obtained in step (i);
  
  (iii) identifying in the first data set samples contaminated by normal cells and eliminating the contaminated samples from the first data set, wherein the identifying and eliminating comprises;
  
  (1) applying a machine learning algorithm tuned to parameters that represent the differences between tumor and normal samples to the data;
  
  (2) assigning a probability score for normal cell contamination to each sample as determined by the machine learning algorithm;
  
  (3) eliminating data from the first data set for each sample scoring 50% or greater probability of containing normal cells;
  
  (iv) estimating a number of subgroups, r, in the data set by applying an unsupervised clustering algorithm using Pearson linear dissimilarity algorithm to the data set;
  
  (v) assigning each sample in the data set to at least one cluster using a modified genomic Non-negative Matrix Factorization (gNMF) algorithm, wherein the modified gNMF algorithm comprises;
  
  (1) calculating divergence of the algorithm after every 100 steps of multiplicative updating using formula (11);
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 10. The method of claim 9, wherein the unsupervised clustering algorithm is a hierarchical clustering.
  - 11. The method of claim 9, wherein Cophenetic correlation is used to provide a final number of clusters from the data set.
  - 12. The method of claim 9, wherein Bayesian information criterion is used to provide a final number of clusters from the data set.
  - 13. The method of claim 9, wherein Cophenetic correlation and Bayesian information criterion are used to provide a final number of clusters from the data set.
  - 14. The method of claim 9, wherein the CRC cells are from a cell line.
  - 15. The method of claim 9, wherein the plurality of samples, m, comprises a first, second, and third cell line, whereinthe first cell line is selected from the group consisting of HCT-8, LS 174T, SK-CO-1, SW48, DLD-1, HCT-15, HCT116, LoVo, CL-34, CL-40, C170, and LS180;
    - the second cell line is selected from the group consisting of Caco-2, LS1034, LS411N, LS513, NCI-H498, NCI-H747, SW1116, SW1417, SW837, HT-29, SW620, CL-11, CL-14, Colo-678, and SW-480; and
      
      the third cell line is selected from the group consisting of Colo 320DM, NCI-H508, NCI-H716, SW1463, SW403, SW948, Colo 205, and Colo-206F.
  - 16. The method of claim 9, wherein the plurality of samples, m, consists of HCT-8, LS 174T, SK-CO-1, SW48, DLD-1, HCT-15, HCT116, LoVo, CL-34, CL-40, C170, LS180, Caco-2, LS1034, LS411N, LS513, NCI-H498, NCI-H747, SW1116, SW1417, SW837, HT-29, SW620, CL-11, CL-14, Colo-678, SW-480, Colo 320DM, NCI-H508, NCI-H716, SW1463, SW403, SW948, Colo 205, and Colo-206F cell lines.
  - 17. The method of claim 9, wherein the therapeutic intervention comprises at least one selected from the group consisting of radiation therapy and chemotherapy.
  - 18. The method of claim 17, wherein the therapeutic intervention is chemotherapy, and the chemotherapy comprises administering at least one pharmaceutical composition comprising an active agent selected from the group consisting of fluorouracil, capecitabine, leucovorin, and oxaliplatin.
  - 19. The method of claim 18, wherein the chemotherapy comprises administering two or more active agents.

20. A method of assembling a probe panel for classifying a CRC cell from a sample, comprising:
- (a) assembling a database, comprising;
  
  (i) obtaining a plurality of m samples comprising at least one CRC tumor or cell line;
  
  (ii) acquiring a first data set comprising copy number alteration information from at least one locus from each chromosome from each sample obtained in step (i);
  
  (iii) identifying in the first data set samples contaminated by normal cells and eliminating the contaminated samples from the first data set, wherein the identifying and eliminating comprises;
  
  (1) applying a machine learning algorithm tuned to parameters that represent the differences between tumor and normal samples to the data;
  
  (2) assigning a probability score for normal cell contamination to each sample as determined by the machine learning algorithm;
  
  (3) eliminating data from the first data set for each sample scoring 50% or greater probability of containing normal cells;
  
  (iv) estimating a number of subgroups, r, in the data set by applying an unsupervised clustering algorithm using Pearson linear dissimilarity algorithm to the data set;
  
  (v) assigning each sample in the data set to at least one cluster using a modified genomic Non-negative Matrix Factorization (gNMF) algorithm, wherein the modified gNMF algorithm comprises;
  
  (1) calculating divergence of the algorithm after every 100 steps of multiplicative updating using formula (11);
- View Dependent Claims (21, 22)
- - 21. A kit comprising the probe panel of claim 20.
  - 22. The kit of claim 21, wherein each probe is a FISH probe.

23. A kit for classifying a CRC tumor sample or a cell line, comprising:
- (a) instructions to assemble a database, comprising instructions for;
  
  (i) obtaining a plurality of m samples comprising at least one CRC tumor or cell line;
  
  (ii) acquiring a first data set comprising copy number alteration information from at least one locus from each chromosome from each sample obtained in step (i);
  
  (iii) identifying in the first data set samples contaminated by normal cells and eliminating the contaminated samples from the first data set, wherein the identifying and eliminating comprises;
  
  (1) applying a machine learning algorithm tuned to parameters that represent the differences between tumor and normal samples to the data;
  
  (2) assigning a probability score for normal cell contamination to each sample as determined by the machine learning algorithm;
  
  (3) eliminating data from the first data set for each sample scoring 50% or greater probability of containing normal cells;
  
  (iv) estimating a number of subgroups, r, in the data set by applying an unsupervised clustering algorithm using Pearson linear dissimilarity algorithm to the data set;
  
  (v) assigning each sample in the data set to at least one cluster using a modified genomic Non-negative Matrix Factorization (gNMF) algorithm, wherein the modified gNMF algorithm comprises;
  
  (1) calculating divergence of the algorithm after every 100 steps of multiplicative updating using formula (11);

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Abbvie Incorporated
Original Assignee
Abbott Laboratories Incorporated
Inventors
Zhang, Ke, Lu, Xin, Semizarov, Dimitri, Lesniewski, Rick R.

Granted Patent

US 8,498,822 B2
Time in Patent Office

Days
Field of Search
US Class Current

706/12
CPC Class Codes

C12Q 1/6886   for cancer immunoassay for ...

C12Q 2600/106   Pharmacogenomics, i.e. gene...

C12Q 2600/112   Disease subtyping, staging ...

C12Q 2600/156   Polymorphic or mutational m...

G16B 20/00   ICT specially adapted for f...

G16B 20/10   Ploidy or copy number detec...

G16B 20/20   Allele or variant detection...

G16B 40/00   ICT specially adapted for b...

G16B 40/30   Unsupervised data analysis

G16B 5/00   ICT specially adapted for m...

G16B 5/20   Probabilistic models

G16B 50/00   ICT programming tools or da...

G16B 50/30   Data warehousing; Computing...

G16H 70/60   relating to pathologies

GENOMIC CLASSIFICATION OF COLORECTAL CANCER BASED ON PATTERNS OF GENE COPY NUMBER ALTERATIONS

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

38 Citations

23 Claims

Specification

Solutions

Use Cases

Quick Links

GENOMIC CLASSIFICATION OF COLORECTAL CANCER BASED ON PATTERNS OF GENE COPY NUMBER ALTERATIONS

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

38 Citations

23 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links