Machine learning method

US 6,917,926 B2
Filed: 06/15/2001
Issued: 07/12/2005
Est. Priority Date: 06/15/2001
Status: Expired due to Fees

First Claim

Patent Images

1. A computer-executable method for using machine learning to predict an outcome, the method comprising:

defining a first outcome associated with a first range of medical costs at least as great as a cost threshold;

defining a second outcome associated with a second range of medical costs less than the cost threshold, wherein the second outcome is more likely than the first outcome; and

processing training data with a machine learning system, wherein said training data is a subset of a data set and is recorded in a computer-readable medium, and wherein the act of processing the training data includes;

selecting a first subset of the training data, the first subset corresponding to the first outcome;

selecting a second subset of the training data, the second subset corresponding to the second outcome and consisting of a set of nearby neighbors to the first outcome; and

selecting a third subset of the training data, the third subset corresponding to the second outcome, wherein the third subset does not consist of nearby neighbors to the first outcome; and

using a plurality of software-based, computer-executable machine learners to develop from the first, second and third subsets one or more sets of computer-executable rules usable to predict the first outcome or the second outcome.

View all claims

15 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for using machine learning to solve problems having either a “positive” result (the event occurred) or a “negative” result (the event did not occur), in which the probability of a positive result is very low and the consequences of the positive result are significant. Training data is obtained and a subset of that data is distilled for application to a machine learning system. The training data includes some records corresponding to the positive result, some nearest neighbors from the records corresponding to the negative result, and some other records corresponding to the negative result. The machine learning system uses a co-evolution approach to obtain a rule set for predicting results after a number of cycles. The machine system uses a fitness function derived for use with the type of problem, such as a fitness function based on the sensitivity and positive predictive value of the rules. The rules are validated using the entire set of training data.

Citations

26 Claims

1. A computer-executable method for using machine learning to predict an outcome, the method comprising:
- defining a first outcome associated with a first range of medical costs at least as great as a cost threshold;
  
  defining a second outcome associated with a second range of medical costs less than the cost threshold, wherein the second outcome is more likely than the first outcome; and
  
  processing training data with a machine learning system, wherein said training data is a subset of a data set and is recorded in a computer-readable medium, and wherein the act of processing the training data includes;
  
  selecting a first subset of the training data, the first subset corresponding to the first outcome;
  
  selecting a second subset of the training data, the second subset corresponding to the second outcome and consisting of a set of nearby neighbors to the first outcome; and
  
  selecting a third subset of the training data, the third subset corresponding to the second outcome, wherein the third subset does not consist of nearby neighbors to the first outcome; and
  
  using a plurality of software-based, computer-executable machine learners to develop from the first, second and third subsets one or more sets of computer-executable rules usable to predict the first outcome or the second outcome.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26)
- - 2. The method of claim 1, wherein the act of selecting the third subset includes randomly selecting a subset of the training data corresponding to the second outcome.
  - 3. The method of claim 1, wherein the training data includes records having an associated medical cost and a plurality of feature variables.
  - 4. The method of claim 3, further comprising the act of identifying a nearby neighbors by using medical cost values.
  - 5. The method of claim 4, wherein the act of selecting the second subset includes randomly selecting a subset of the identified set of nearby neighbors as the second subset.
  - 6. The method of claim 4, wherein the act of selecting the second subset includes selecting as the second subset all of the identified set of nearby neighbors.
  - 7. The method of claim 4, further comprising the act of identifying a set of nearby neighbors using values of the plurality of feature variables for the training data.
  - 8. The method of claim 1, further comprising the act of validating the one or more sets of rules using the data set.
  - 9. The method of claim 8, wherein the act of validating the one or more sets of rules includes obtaining one or more accuracy measures for the rules using a portion of the data set.
  - 10. The method of claim 8, wherein the act of validating the one or more sets of rules includes obtaining one or more accuracy measures for the rules using the entire data set.
  - 11. The method of claim 10, wherein the act of validating the one or more sets of rules further includes obtaining the one or more accuracy measures for the training.
  - 12. The method of claim 10, wherein the act of obtaining one or more accuracy measures includes obtaining measures of a positive predictive value, a negative predictive value, a sensitivity, and a selectivity of the rules.
  - 13. The method of claim 1, wherein the act of using a plurality of software-based, computer-executable machine learners includes developing a set of interim rules using the plurality of software-based, computer-executable machine learners, evaluating the set of interim rules, and developing a revised set of interim rules using the results of the evaluating step.
  - 14. The method of claim 13, wherein the act of evaluating the set of interim rules includes applying a user-selectable fitness function.
  - 15. The method of claim 13, wherein the act of evaluating the set of interim rules includes applying a fitness function based on one or more of a sensitivity, a positive predictive value, and a correlation coefficient of the interim rules.
  - 24. The method of claim 1, wherein the plurality of software-based, computer-executable machine learners executes a neural network machine learning process.
  - 25. The method of claim 1, wherein the plurality of software-based, computer-executable machine learners executes a decision tree machine learning process.
  - 26. The method of claim 1, wherein the first, second and third subsets each include approximately equal amounts of data.

16. A computer-executable method for using machine learning to predict results comprising the act of:
- processing a representation of a subset of a data set with a machine learning system, the representation comprising;
  
  first data corresponding to a first outcome, wherein the first outcome is associated with a range of medical costs at least as great as a predetermined threshold amount;
  
  second data corresponding to a second outcome, wherein the second outcome is associated with a range of medical costs lower than the predetermined threshold amount, wherein the second data consists of a set of nearby neighbors to the first outcome, and wherein the second outcome is less likely than the first outcome; and
  
  third data corresponding to the second outcome, wherein the third data is different than the second data;
  
  repeating for a plurality of cycles;
  
  using a plurality of software-based, computer-executable machine learners to develop a set of computer executable rules from the processed representation of the subset of the data set;
  
  evaluating the set of computer-executable rules using a user-selectable fitness function; and
  
  modifying the machine learning methods executed by a plurality of software-based, computer-executable machine learners by using the results of the evaluating act; and
  
  presenting a final set of computer-executable rules usable to predict the first outcome or the second outcome.
- View Dependent Claims (17, 18, 19, 20)
- - 17. The method of claim 16, wherein the act of evaluating a set of rules includes using a user-selectable fitness function based on one or more of:
    - a number of true positives, a number of true negatives, a number of false positives, and a number of false negatives that the set of rules obtains from the subset of the data set.
  - 18. The method of claim 16, wherein the act of evaluating a set of rules includes using a user-selectable fitness function based on a sensitivity and a positive predictive value of the rules.
  - 19. The method of claim 16, wherein the act of evaluating a set of rules includes using a user-selectable fitness function based on a sensitivity, a positive predictive value, and a correlation coefficient of the rules.
  - 20. The method of claim 16, further comprising, in at least one of the plurality of cycles, developing one or more new representations of the data for use by the plurality of software-based, computer-executable machine learners in a subsequent cycle.

21. A computer-executable method for using machine learning to predict a positive or a negative outcome, where the positive outcome is less likely than the negative outcome, the method comprising:
- defining a positive outcome associated with a range of medical costs equal to or greater than a cost threshold;
  
  defining a negative outcome associated with a range of medical costs less than the cost threshold; and
  
  processing training data with a machine learning system, wherein said training data is a subset of a data set and is recorded in a computer-readable medium, and wherein the act of processing the training data includes;
  
  selecting a first subset of the training data, the first subset corresponding to the positive outcome;
  
  selecting a second subset of the training data, the second subset corresponding to the negative outcome and consisting of a set of nearest neighbors to the positive outcome;
  
  selecting a third subset of the training data, the third subset corresponding to the negative outcome, wherein the third subset does not consist of nearest neighbors to the positive outcome; and
  
  using a plurality of software-based, computer-executable machine learners to develop from the first, second and third subsets of the training data one or more sets of computer-executable rules usable to predict either the positive outcome or the negative outcome.
- View Dependent Claims (22, 23)
- - 22. The method of claim 21, wherein the act of using a plurality of software-based, computer-executable machine learners to develop one or more sets of rules includes applying a user-selectable fitness function to develop the one or more sets of rules.
  - 23. The method of claim 21, wherein the negative outcome is at least thirty times more likely than the positive outcome.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Staywell Health Management LLC (Merck & Co., Inc.)
Original Assignee
Medical Scientists, Inc. (Merck & Co., Inc.)
Inventors
Chen, Hung-Han, Hunter, Lawrence, Snow, Kristin Kendall, Poteat, Harry Towsley
Primary Examiner(s)
Knight, Anthony
Assistant Examiner(s)
Holmes, Michael B.

Application Number

US09/882,502
Publication Number

US 20030018595A1
Time in Patent Office

1,488 Days
Field of Search

706/12, 706/21, 706/15, 706/16, 706/25
US Class Current

706/12
CPC Class Codes

G06F 18/211 Selection of the most signi...

G06N 20/00 Machine learning

Machine learning method

First Claim

15 Assignments

0 Petitions

Accused Products

Abstract

Citations

26 Claims

Specification

Solutions

Use Cases

Quick Links

Machine learning method

First Claim

15 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

26 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links