Systems and methods for generating biomarker signatures with integrated dual ensemble and generalized simulated annealing techniques
First Claim
Patent Images
1. A computer-implemented method of classifying a data set into two or more classes executed by a processor, comprising:
- (a) receiving a training data set associated with the data set and having a set of known labels, wherein the data set comprises gene set data, and each gene set data corresponds to one of a plurality of biological state classes, and wherein the labels identify the biological state classes of the gene set data;
(b) generating a first classifier for the training data set by applying a first machine learning technique to the training data set, wherein the first machine learning technique identifies a first set of classification methods, wherein each classification method votes on the training data set;
(c) classifying elements in the training data set according to the first classifier to obtain a first set of predicted labels for the training data set;
(d) computing a first objective value from the first set of predicted labels and the set of known labels;
(e) for each of a plurality of iterations, performing the following steps (i)-(v);
(i) generating a second classifier for the training data set by applying a second machine learning technique to the training data set, wherein the second machine learning technique identifies a second set of classification methods that is different from the first set of classification methods by at least one classification method, wherein each classification method votes on the training data set;
ii) classifying the elements in the training data set according to the second classifier to obtain a second set of predicted labels for the training data set;
(iii) computing a second objective value from the second set of predicted labels and the set of known labels;
(iv) comparing the first objective value to the second objective value to determine whether the second classifier outperforms the first classifier; and
(v) replacing the first set of predicted labels with the second set of predicted labels and replacing the first objective value with the second objective value when the second classifier outperforms the first classifier, and return to step (i); and
(f) when a desired number of iterations has been reached, outputting the first set of predicted labels.
1 Assignment
0 Petitions
Accused Products
Abstract
Described herein are systems and methods for classifying a data set using an ensemble classification technique. Classifiers are iteratively generated by applying machine learning techniques to a training data set, and training class sets are generated by classifying the elements in the training data set according to the classifiers. Objective values are computed based on the training class sets, and objective values associated with different classifiers are compared until a desired number of iterations is reached, and a final training class set is output.
17 Citations
20 Claims
-
1. A computer-implemented method of classifying a data set into two or more classes executed by a processor, comprising:
-
(a) receiving a training data set associated with the data set and having a set of known labels, wherein the data set comprises gene set data, and each gene set data corresponds to one of a plurality of biological state classes, and wherein the labels identify the biological state classes of the gene set data; (b) generating a first classifier for the training data set by applying a first machine learning technique to the training data set, wherein the first machine learning technique identifies a first set of classification methods, wherein each classification method votes on the training data set; (c) classifying elements in the training data set according to the first classifier to obtain a first set of predicted labels for the training data set; (d) computing a first objective value from the first set of predicted labels and the set of known labels; (e) for each of a plurality of iterations, performing the following steps (i)-(v); (i) generating a second classifier for the training data set by applying a second machine learning technique to the training data set, wherein the second machine learning technique identifies a second set of classification methods that is different from the first set of classification methods by at least one classification method, wherein each classification method votes on the training data set; ii) classifying the elements in the training data set according to the second classifier to obtain a second set of predicted labels for the training data set; (iii) computing a second objective value from the second set of predicted labels and the set of known labels; (iv) comparing the first objective value to the second objective value to determine whether the second classifier outperforms the first classifier; and (v) replacing the first set of predicted labels with the second set of predicted labels and replacing the first objective value with the second objective value when the second classifier outperforms the first classifier, and return to step (i); and (f) when a desired number of iterations has been reached, outputting the first set of predicted labels. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A computer program product comprising computer-readable instructions that, when executed in a computerized system comprising at least one processor, cause the processor to carry out a method, comprising:
-
(a) receiving a training data set associated with a data set and having a set of known labels, wherein the data set comprises gene set data, and each gene set data corresponds to one of a plurality of biological state classes, and wherein the labels identify the biological state classes of the gene set data; (b) generating a first classifier for the training data set by applying a first machine learning technique to the training data set, wherein the first machine learning technique identifies a first set of classification methods, wherein each classification method votes on the training data set; (c) classifying elements in the training data set according to the first classifier to obtain a first set of predicted labels for the training data set; (d) computing a first objective value from the first set of predicted labels and the set of known labels; (e) for each of a plurality of iterations, performing the following steps (i)-(v); (i) generating a second classifier for the training data set by applying a second machine learning technique to the training data set, wherein the second machine learning technique identifies a second set of classification methods that is different from the first set of classification methods by at least one classification method, wherein each classification method votes on the training data set; (ii) classifying the elements in the training data set according to the second classifier to obtain a second set of predicted labels for the training data set; (iii) computing a second objective value from the second set of predicted labels and the set of known labels (iv) comparing the first objective value to the second objective value to determine whether the second classifier outperforms the first classifier; and (v) replacing the first set of predicted labels with the second set of predicted labels and replacing the first objective value with the second objective value when the second classifier outperforms the first classifier, and return to step (i); and (f) when a desired number of iterations has been reached, outputting the first set of predicted labels.
-
-
20. A computerized system comprising a processing device configured with non-transitory computer-readable instructions that, when executed, cause the processing device to carry out a method comprising:
-
(a) receiving a training data set associated with a data set and having a set of known labels, wherein the data set comprises gene set data, and each gene set data corresponds to one of a plurality of biological state classes, and wherein the labels identify the biological state classes of the gene set data; (b) generating a first classifier for the training data set by applying a first machine learning technique to the training data set, wherein the first machine learning technique identifies a first set of classification methods, wherein each classification method votes on the training data set; (c) classifying elements in the training data set according to the first classifier to obtain a first set of predicted labels for the training data set; (d) computing a first objective value from the first set of predicted labels and the set of known labels; (e) for each of a plurality of iterations, performing the following steps (i)-(v); (i) generating a second classifier for the training data set by applying a second machine learning technique to the training data set, wherein the second machine learning technique identifies a second set of classification methods that is different from the first set of classification methods by at least one classification method, wherein each classification method votes on the training data set; (ii) classifying the elements in the training data set according to the second classifier to obtain a second set of predicted labels for the training data set; (iii) computing a second objective value from the second set of predicted labels and the set of known labels; (iv) comparing the first objective value to the second objective value to determine whether the second classifier outperforms the first classifier; (v) replacing the first set of predicted labels with the second set of predicted labels and replacing the first objective value with the second objective value when the second classifier outperforms the first classifier, and return to step (i); and (f) when a desired number of iterations has been reached, outputting the first set of predicted labels.
-
Specification