METHODS FOR ASSEMBLING PANELS OF CANCER CELL LINES FOR USE IN TESTING THE EFFICACY OF ONE OR MORE PHARMACEUTICAL COMPOSITIONS
First Claim
Patent Images
1. An algorithm for use in clustering tumors and cell lines to define genomic subgroups, the method comprising the steps of:
- (a) obtaining a plurality of m samples comprising at least one tumor or cancer cell line;
(b) acquiring a data set comprising copy number alteration information from at least one locus from each chromosome from each sample obtained in step (a);
(c) identifying in the data set, copy number alteration information obtained from samples contaminated by normal cells and eliminating the contaminated samples from the data set, wherein the identifying and eliminating comprises;
(1) applying a machine learning algorithm tuned to parameters that represent the differences between tumor and normal samples to the data;
(2) assigning a probability score for normal cell contamination to each sample as determined by the machine learning algorithm;
(3) eliminating data from the data set for each sample scoring 50% or greater probability of containing normal cells;
(d) estimating a number of subgroups, r, in the data set by applying an unsupervised clustering algorithm using Pearson linear dissimilarity algorithm to the data set;
(e) assigning each sample in the data set to at least one cluster using a modified genomic non-negative matrix factorization (gNMF) algorithm, wherein the modified gNMF algorithm comprises;
(1) calculating divergence of the algorithm after every 100 steps of multiplicative updating using the formula (1);
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention relates to algorithms for use in defining genomic subgroups of tumors and cancer cell lines. The present invention also relates to methods for assembling panels of tumors and cancer cell lines according to genomic subgroups for use in testing the efficacy of one or more pharmaceutical compounds in the treatment of subjects suffering from at least one cancer.
29 Citations
8 Claims
-
1. An algorithm for use in clustering tumors and cell lines to define genomic subgroups, the method comprising the steps of:
-
(a) obtaining a plurality of m samples comprising at least one tumor or cancer cell line; (b) acquiring a data set comprising copy number alteration information from at least one locus from each chromosome from each sample obtained in step (a); (c) identifying in the data set, copy number alteration information obtained from samples contaminated by normal cells and eliminating the contaminated samples from the data set, wherein the identifying and eliminating comprises; (1) applying a machine learning algorithm tuned to parameters that represent the differences between tumor and normal samples to the data; (2) assigning a probability score for normal cell contamination to each sample as determined by the machine learning algorithm; (3) eliminating data from the data set for each sample scoring 50% or greater probability of containing normal cells; (d) estimating a number of subgroups, r, in the data set by applying an unsupervised clustering algorithm using Pearson linear dissimilarity algorithm to the data set; (e) assigning each sample in the data set to at least one cluster using a modified genomic non-negative matrix factorization (gNMF) algorithm, wherein the modified gNMF algorithm comprises; (1) calculating divergence of the algorithm after every 100 steps of multiplicative updating using the formula (1); - View Dependent Claims (2, 3, 4, 5)
-
-
6. A method for assembling panels of tumor and cancer cell lines according to genomic subgroups, the method comprising the steps of:
-
(a) obtaining a plurality of m samples comprising at least one tumor or cancer cell line; (b) acquiring a data set comprising copy number alteration information from at least one locus from each chromosome from each sample obtained in step (a); (c) identifying in the data set, copy number alteration information obtained from samples contaminated by normal cells and eliminating the contaminated samples from the data set, wherein the identifying and eliminating comprises; (1) applying a machine learning algorithm tuned to parameters that represent the differences between tumor and normal samples to the data; (2) assigning a probability score for normal cell contamination to each sample, as determined by the machine learning algorithm; (3) eliminating data from the data set for each sample scoring 50% or greater probability of containing normal cells; (d) estimating a number of subgroups, r, in the data set by applying unsupervised clustering using Pearson linear dissimilarity algorithm to the data set; (e) assigning each sample in the data set to at least one cluster using a modified genomic non-negative matrix factorization (gNMF) algorithm, wherein the modified gNMF algorithm comprises; (1) calculating divergence of the algorithm after every 100 steps of multiplicative updating using the formula (1); - View Dependent Claims (7, 8)
-
Specification