Method for training a learning-capable system

US 7,801,839 B2
Filed: 07/03/2003
Issued: 09/21/2010
Est. Priority Date: 07/04/2002
Status: Active Grant

First Claim

Patent Images

1. A method for training at least one artificial learning-capable system comprising the steps of:

providing a predetermined training data set comprising a predetermined input data set and a predetermined outcome data set corresponding to input data for each of a respective predetermined number of subjects,observing survival data relating to patient survival of J subjects,recording covariates denoted x_g(j) at a reference time t=0 relating to events that have not occurred for each subject in any order,recording special covariates denoted z_p(j) relating to treatments received by each subject,assuming each subject represents a random sample drawn from a large pool of subjects with identical covariates x, z, defining the conditional probability S(t|x,z) for surviving to time t given x, z,estimating the p-th propensity score φ

_pcorresponding to the probability for subject j to have treatment z_p=1,categorizing the propensity scores into a number N_pof categories, designated as strata, andaugmenting the input data set and/or the outcome data set by the propensity scores and/or the stratum categorization, andtraining each artificial learning-capable system using the augmented input data set and/or the augmented outcome data set that was augmented according to the augmenting step, through the use of a computing device.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

The invention is directed to a method for training at least one learning-capable system comprising the steps of providing a predetermined training data set corresponding to a predetermined number of subjects comprising a predetermined input data set and a predetermined outcome data set, augmenting the input data set and/or the outcome data set, and training each learning-capable system using the augmented input data set and/or the augmented outcome data set.

Citations

19 Claims

1. A method for training at least one artificial learning-capable system comprising the steps of:
- providing a predetermined training data set comprising a predetermined input data set and a predetermined outcome data set corresponding to input data for each of a respective predetermined number of subjects,observing survival data relating to patient survival of J subjects,recording covariates denoted x_g(j) at a reference time t=0 relating to events that have not occurred for each subject in any order,recording special covariates denoted z_p(j) relating to treatments received by each subject,assuming each subject represents a random sample drawn from a large pool of subjects with identical covariates x, z, defining the conditional probability S(t|x,z) for surviving to time t given x, z,estimating the p-th propensity score φ
  
  _pcorresponding to the probability for subject j to have treatment z_p=1,categorizing the propensity scores into a number N_pof categories, designated as strata, andaugmenting the input data set and/or the outcome data set by the propensity scores and/or the stratum categorization, andtraining each artificial learning-capable system using the augmented input data set and/or the augmented outcome data set that was augmented according to the augmenting step, through the use of a computing device.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19)
- - 2. The method according to claim 1, wherein the training step comprisesoptimizing operating point parameters within each stratum,determining the operating point corrections OP_kl(φ
    - ₁, φ
      
      ₂, . . . , φ
      
      _P) for shifting the output of the learning capable system NN_kt(X) with X={x,z}, provided by the learning capable system, given the propensity scores φ
      
      ₁, φ
      
      ₂, . . . , φ
      
      _P, considering a hazard model λ
      
      _k(t|X)=λ
      
      _k0(t)h_k(t|X, φ
      
      ₁, φ
      
      ₂, . . . , φ
      
      _P),
3. The method according to claim 2, wherein the operating point parameters are optimized such that the median of all output data of users assigned to each stratum vanishes.
4. The method according to claim 1, wherein the augmenting step comprises the step of:
- generating a plurality of augmented training data sets by augmenting the input data set using a predetermined statistical model.
5. The method according to claim 4, wherein the training step comprises the steps of:
- training each of at least two said artificial learning-capable systems using a subset of the plurality of augmented training data sets,constructing scores for each outcome for each said trained artificial learning-capable system, anddetermining characteristics of distributions of the scores for each subject.
6. The method according to claim 5, wherein the input data set is augmented using a generalized Markov chain Monte-Carlo method.
7. The method according to claim 1, wherein the augmenting step comprises the steps of:
- providing a further artificial learning capable-system and a further predetermined training data set comprising a further predetermined input data set and a further predetermined outcome data set for each of a respective further predetermined number of subjects,training the further learning-capable system using the further predetermined training data set, andaugmenting the input data set by at least one additional input variable taken from the further predetermined input data set, further predetermined outcome data set and/or internal output data obtained from the trained further artificial learning-capable system.
8. The method according to claim 7, wherein the additional input variables comprise all further input data and all further outcome data of a subset of subjects of the further training data set.
9. The method according to claim 1, wherein the outcome data of the training data set is time-dependent and the augmenting step comprises pre-transforming a time variable of the training data set in such a way that an associated hazard rate with respect to a predetermined outcome is a predetermined function of the time variable.
10. The method according to claim 1 wherein input data of a subject is applied to the trained artificial learning-capable system to generate an outcome of the artificial learning-capable system, and the method further comprises correcting the outcome with respect to a predetermined reference subject.
11. The method according to claim 6, wherein input data of a subject is applied to at least two artificial learning-capable systems to generate output data of the artificial learning-capable systems, wherein applying input data comprises the steps of:
- presenting the input data of the subject to each of the artificial learning-capable systems andconstructing a score for the output data obtained from the artificial learning-capable systems.
12. The method according to claim 1, further comprising creating a composite training data set for use in training the artificial learning-capable system, wherein said creating comprises the steps of:
- providing an aggregated evidence data set,disaggregating the aggregated evidence data set to obtain a disaggregated training data set based on virtual subjects, andmerging the disaggregated training data set with a further training data set to produce the predetermined training data set.
13. The method according to claim 12, wherein the merging step comprises the step of choosing a real training data set based on real subjects as the further training data set.
14. The method according to claim 12, wherein the disaggregation step comprises the step of assigning at least a value of one auxiliary variable to each virtual subject of the disaggregated training data set according to predetermined criteria.
15. The method according to claim 1, wherein the predetermined training data set is provided by:
- providing an aggregated evidence data set,disaggregating the aggregated evidence data set to obtain a disaggregated training data set based on virtual subjects, andmerging the disaggregated training data set with a further training data set to produce the predetermined training data set.
16. A computer program product directly loadable into the internal memory of a digital computer, comprising software code portions for performing the steps of the method of claim 1, when said product is run on a computer.
17. A computer program product stored on a medium readable by a computer, comprising computer readable program means for causing a computer to perform the steps of the method of claim 1, when said product is run on a computer.
18. The method according to claim 4, wherein the input data set is augmented using a generalized Markov chain Monte-Carlo method.
19. The method according to claim 13, wherein the disaggregation step comprises the step of assigning at least a value of one auxiliary variable to each virtual subject of the disaggregated training data set.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Nadia Harbeck, Ronald E. Kates
Original Assignee
Nadia Harbeck, Ronald E. Kates
Inventors
Harbeck, Nadia, Kates, Ronald E.
Primary Examiner(s)
Sparks; Donald
Assistant Examiner(s)
FERNANDEZ RIVAS, OMAR F

Application Number

US10/520,409
Publication Number

US 20060248031A1
Time in Patent Office

2,637 Days
Field of Search

706/12, 706 15- 21, 706/25, 706/924, 600/300, 600/301, 600/408
US Class Current

706/21
CPC Class Codes

G16H 50/20 for computer-aided diagnosi...

Method for training a learning-capable system

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

19 Claims

Specification

Solutions

Use Cases

Quick Links

Method for training a learning-capable system

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

19 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links