GENOMIC CLASSIFICATION OF COLORECTAL CANCER BASED ON PATTERNS OF GENE COPY NUMBER ALTERATIONS
First Claim
Patent Images
1. A method for obtaining a database of colorectal cancer genomic subgroups, the method comprising the steps of:
- (a) obtaining a plurality of m samples comprising at least one CRC cell, wherein the samples comprise cell lines or tumors;
(b) acquiring a data set comprising copy number alteration information from at least one locus from each chromosome from each sample obtained in step (a);
(c) identifying in the data set samples contaminated by normal cells and eliminating the contaminated samples from the data set, wherein the identifying and eliminating comprises;
(1) applying a machine learning algorithm tuned to parameters that represent the differences between tumor and normal samples to the data;
(2) assigning a probability score for normal cell contamination to each sample as determined by the machine learning algorithm;
(3) eliminating data from the data set for each sample scoring 50% or greater probability of containing normal cells;
(d) estimating a number of subgroups, r, in the data set by applying an unsupervised clustering algorithm using Pearson linear dissimilarity algorithm to the data set;
(e) assigning each sample in the data set to at least one cluster using a modified genomic Non-negative Matrix Factorization (gNMF) algorithm, wherein the modified gNMF algorithm comprises;
(1) calculating divergence of the algorithm after every 100 steps of multiplicative updating using formula (11);
2 Assignments
0 Petitions
Accused Products
Abstract
The invention is directed to methods and kits that allow for classification of colorectal cancer cells according to genomic profiles, and methods of diagnosing, predicting clinical outcomes, and stratifying patient populations for clinical testing and treatment using the same.
38 Citations
23 Claims
-
1. A method for obtaining a database of colorectal cancer genomic subgroups, the method comprising the steps of:
-
(a) obtaining a plurality of m samples comprising at least one CRC cell, wherein the samples comprise cell lines or tumors; (b) acquiring a data set comprising copy number alteration information from at least one locus from each chromosome from each sample obtained in step (a); (c) identifying in the data set samples contaminated by normal cells and eliminating the contaminated samples from the data set, wherein the identifying and eliminating comprises; (1) applying a machine learning algorithm tuned to parameters that represent the differences between tumor and normal samples to the data; (2) assigning a probability score for normal cell contamination to each sample as determined by the machine learning algorithm; (3) eliminating data from the data set for each sample scoring 50% or greater probability of containing normal cells; (d) estimating a number of subgroups, r, in the data set by applying an unsupervised clustering algorithm using Pearson linear dissimilarity algorithm to the data set; (e) assigning each sample in the data set to at least one cluster using a modified genomic Non-negative Matrix Factorization (gNMF) algorithm, wherein the modified gNMF algorithm comprises; (1) calculating divergence of the algorithm after every 100 steps of multiplicative updating using formula (11); - View Dependent Claims (3, 4, 5, 6, 7, 8)
-
-
2. A method of classifying a CRC tumor or cell line, comprising:
(a) providing a database, developed through a method comprising; (i) obtaining a plurality of m samples comprising at least one CRC tumor or cell line; (ii) acquiring a first data set comprising copy number alteration information from at least one locus from each chromosome from each sample obtained in step (i); (iii) identifying in the first data set samples contaminated by normal cells and eliminating the contaminated samples from the first data set, wherein the identifying and eliminating comprises; (1) applying a machine learning algorithm tuned to parameters that represent the differences between tumor and normal samples to the data; (2) assigning a probability score for normal cell contamination to each sample as determined by the machine learning algorithm; (3) eliminating data from the first data set for each sample scoring 50% or greater probability of containing normal cells; (iv) estimating a number of subgroups, r, in the data set by applying an unsupervised clustering algorithm using Pearson linear dissimilarity algorithm to the data set; (v) assigning each sample in the data set to at least one cluster using a modified genomic Non-negative Matrix Factorization (gNMF) algorithm, wherein the modified gNMF algorithm comprises; (1) calculating divergence of the algorithm after every 100 steps of multiplicative updating using formula (11);
-
9. A method of classifying a therapeutic intervention for arresting or killing colorectal cancer (CRC) cells, comprising:
(a) from a panel of CRC cells classified according to genomic subgroups, selecting at least one CRC cell line from each subgroup, wherein the panel is assembled from a method comprising; (i) obtaining a plurality of m samples comprising at least one CRC tumor or cell line; (ii) acquiring a first data set comprising copy number alteration information from at least one locus from each chromosome from each sample obtained in step (i); (iii) identifying in the first data set samples contaminated by normal cells and eliminating the contaminated samples from the first data set, wherein the identifying and eliminating comprises; (1) applying a machine learning algorithm tuned to parameters that represent the differences between tumor and normal samples to the data; (2) assigning a probability score for normal cell contamination to each sample as determined by the machine learning algorithm; (3) eliminating data from the first data set for each sample scoring 50% or greater probability of containing normal cells; (iv) estimating a number of subgroups, r, in the data set by applying an unsupervised clustering algorithm using Pearson linear dissimilarity algorithm to the data set; (v) assigning each sample in the data set to at least one cluster using a modified genomic Non-negative Matrix Factorization (gNMF) algorithm, wherein the modified gNMF algorithm comprises; (1) calculating divergence of the algorithm after every 100 steps of multiplicative updating using formula (11); - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
-
20. A method of assembling a probe panel for classifying a CRC cell from a sample, comprising:
(a) assembling a database, comprising; (i) obtaining a plurality of m samples comprising at least one CRC tumor or cell line; (ii) acquiring a first data set comprising copy number alteration information from at least one locus from each chromosome from each sample obtained in step (i); (iii) identifying in the first data set samples contaminated by normal cells and eliminating the contaminated samples from the first data set, wherein the identifying and eliminating comprises; (1) applying a machine learning algorithm tuned to parameters that represent the differences between tumor and normal samples to the data; (2) assigning a probability score for normal cell contamination to each sample as determined by the machine learning algorithm; (3) eliminating data from the first data set for each sample scoring 50% or greater probability of containing normal cells; (iv) estimating a number of subgroups, r, in the data set by applying an unsupervised clustering algorithm using Pearson linear dissimilarity algorithm to the data set; (v) assigning each sample in the data set to at least one cluster using a modified genomic Non-negative Matrix Factorization (gNMF) algorithm, wherein the modified gNMF algorithm comprises; (1) calculating divergence of the algorithm after every 100 steps of multiplicative updating using formula (11); - View Dependent Claims (21, 22)
-
23. A kit for classifying a CRC tumor sample or a cell line, comprising:
(a) instructions to assemble a database, comprising instructions for; (i) obtaining a plurality of m samples comprising at least one CRC tumor or cell line; (ii) acquiring a first data set comprising copy number alteration information from at least one locus from each chromosome from each sample obtained in step (i); (iii) identifying in the first data set samples contaminated by normal cells and eliminating the contaminated samples from the first data set, wherein the identifying and eliminating comprises; (1) applying a machine learning algorithm tuned to parameters that represent the differences between tumor and normal samples to the data; (2) assigning a probability score for normal cell contamination to each sample as determined by the machine learning algorithm; (3) eliminating data from the first data set for each sample scoring 50% or greater probability of containing normal cells; (iv) estimating a number of subgroups, r, in the data set by applying an unsupervised clustering algorithm using Pearson linear dissimilarity algorithm to the data set; (v) assigning each sample in the data set to at least one cluster using a modified genomic Non-negative Matrix Factorization (gNMF) algorithm, wherein the modified gNMF algorithm comprises; (1) calculating divergence of the algorithm after every 100 steps of multiplicative updating using formula (11);
Specification