Method for training a learning-capable system
First Claim
Patent Images
1. A method for training at least one artificial learning-capable system comprising the steps of:
- providing a predetermined training data set comprising a predetermined input data set and a predetermined outcome data set corresponding to input data for each of a respective predetermined number of subjects,observing survival data relating to patient survival of J subjects,recording covariates denoted xg(j) at a reference time t=0 relating to events that have not occurred for each subject in any order,recording special covariates denoted zp(j) relating to treatments received by each subject,assuming each subject represents a random sample drawn from a large pool of subjects with identical covariates x, z, defining the conditional probability S(t|x,z) for surviving to time t given x, z,estimating the p-th propensity score φ
p corresponding to the probability for subject j to have treatment zp=1,categorizing the propensity scores into a number Np of categories, designated as strata, andaugmenting the input data set and/or the outcome data set by the propensity scores and/or the stratum categorization, andtraining each artificial learning-capable system using the augmented input data set and/or the augmented outcome data set that was augmented according to the augmenting step, through the use of a computing device.
0 Assignments
0 Petitions
Accused Products
Abstract
The invention is directed to a method for training at least one learning-capable system comprising the steps of providing a predetermined training data set corresponding to a predetermined number of subjects comprising a predetermined input data set and a predetermined outcome data set, augmenting the input data set and/or the outcome data set, and training each learning-capable system using the augmented input data set and/or the augmented outcome data set.
-
Citations
19 Claims
-
1. A method for training at least one artificial learning-capable system comprising the steps of:
-
providing a predetermined training data set comprising a predetermined input data set and a predetermined outcome data set corresponding to input data for each of a respective predetermined number of subjects, observing survival data relating to patient survival of J subjects, recording covariates denoted xg(j) at a reference time t=0 relating to events that have not occurred for each subject in any order, recording special covariates denoted zp(j) relating to treatments received by each subject, assuming each subject represents a random sample drawn from a large pool of subjects with identical covariates x, z, defining the conditional probability S(t|x,z) for surviving to time t given x, z, estimating the p-th propensity score φ
p corresponding to the probability for subject j to have treatment zp=1,categorizing the propensity scores into a number Np of categories, designated as strata, and augmenting the input data set and/or the outcome data set by the propensity scores and/or the stratum categorization, and training each artificial learning-capable system using the augmented input data set and/or the augmented outcome data set that was augmented according to the augmenting step, through the use of a computing device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
where k denotes the k-th outcome and the hazard is decomposed as
hk(t|X, φ
1, φ
2, . . . , φ
P)=exp[Σ
l=1LBl(t)(NNkl(X)−
OPkl(φ
1, φ
2, . . . , φ
P))],wherein Bl(t) are suitable functions comprising the time dependence.
-
-
3. The method according to claim 2, wherein the operating point parameters are optimized such that the median of all output data of users assigned to each stratum vanishes.
-
4. The method according to claim 1, wherein the augmenting step comprises the step of:
generating a plurality of augmented training data sets by augmenting the input data set using a predetermined statistical model.
-
5. The method according to claim 4, wherein the training step comprises the steps of:
-
training each of at least two said artificial learning-capable systems using a subset of the plurality of augmented training data sets, constructing scores for each outcome for each said trained artificial learning-capable system, and determining characteristics of distributions of the scores for each subject.
-
-
6. The method according to claim 5, wherein the input data set is augmented using a generalized Markov chain Monte-Carlo method.
-
7. The method according to claim 1, wherein the augmenting step comprises the steps of:
-
providing a further artificial learning capable-system and a further predetermined training data set comprising a further predetermined input data set and a further predetermined outcome data set for each of a respective further predetermined number of subjects, training the further learning-capable system using the further predetermined training data set, and augmenting the input data set by at least one additional input variable taken from the further predetermined input data set, further predetermined outcome data set and/or internal output data obtained from the trained further artificial learning-capable system.
-
-
8. The method according to claim 7, wherein the additional input variables comprise all further input data and all further outcome data of a subset of subjects of the further training data set.
-
9. The method according to claim 1, wherein the outcome data of the training data set is time-dependent and the augmenting step comprises pre-transforming a time variable of the training data set in such a way that an associated hazard rate with respect to a predetermined outcome is a predetermined function of the time variable.
-
10. The method according to claim 1 wherein input data of a subject is applied to the trained artificial learning-capable system to generate an outcome of the artificial learning-capable system, and the method further comprises correcting the outcome with respect to a predetermined reference subject.
-
11. The method according to claim 6, wherein input data of a subject is applied to at least two artificial learning-capable systems to generate output data of the artificial learning-capable systems, wherein applying input data comprises the steps of:
-
presenting the input data of the subject to each of the artificial learning-capable systems and constructing a score for the output data obtained from the artificial learning-capable systems.
-
-
12. The method according to claim 1, further comprising creating a composite training data set for use in training the artificial learning-capable system, wherein said creating comprises the steps of:
-
providing an aggregated evidence data set, disaggregating the aggregated evidence data set to obtain a disaggregated training data set based on virtual subjects, and merging the disaggregated training data set with a further training data set to produce the predetermined training data set.
-
-
13. The method according to claim 12, wherein the merging step comprises the step of choosing a real training data set based on real subjects as the further training data set.
-
14. The method according to claim 12, wherein the disaggregation step comprises the step of assigning at least a value of one auxiliary variable to each virtual subject of the disaggregated training data set according to predetermined criteria.
-
15. The method according to claim 1, wherein the predetermined training data set is provided by:
-
providing an aggregated evidence data set, disaggregating the aggregated evidence data set to obtain a disaggregated training data set based on virtual subjects, and merging the disaggregated training data set with a further training data set to produce the predetermined training data set.
-
-
16. A computer program product directly loadable into the internal memory of a digital computer, comprising software code portions for performing the steps of the method of claim 1, when said product is run on a computer.
-
17. A computer program product stored on a medium readable by a computer, comprising computer readable program means for causing a computer to perform the steps of the method of claim 1, when said product is run on a computer.
-
18. The method according to claim 4, wherein the input data set is augmented using a generalized Markov chain Monte-Carlo method.
-
19. The method according to claim 13, wherein the disaggregation step comprises the step of assigning at least a value of one auxiliary variable to each virtual subject of the disaggregated training data set.
Specification