SYSTEMS AND METHODS FOR ANALYZING DATA TO PREDICT MEDICAL OUTCOMES

US 20080120267A1
Filed: 01/24/2008
Published: 05/22/2008
Est. Priority Date: 06/15/2001
Status: Abandoned Application

First Claim

Patent Images

1. A computer-executable method for analyzing data to predict medical outcomes, the method comprising:

receiving data associating feature variables comprising demographic data of a plurality of patients with outcome variables corresponding to medical conditions of the plurality of patients, wherein the data comprises a first data set associated with a first outcome and a second data set associated with a second outcome, the second outcome being substantially less likely than the first outcome;

identifying within the first data set a third data set that consists essentially of nearby neighbors to the second data set; and

processing the first, second and third data sets to generate at least one set of computer-executable rules for predicting the likelihood of the first outcome or the second outcome.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method for using machine learning to solve problems having either a “positive” result (the event occurred) or a “negative” result (the event did not occur), in which the probability of a positive result is very low and the consequences of the positive result are significant. Training data is obtained and a subset of that data is distilled for application to a machine learning system. The training data includes some records corresponding to the positive result, some nearest neighbors from the records corresponding to the negative result, and some other records corresponding to the negative result. The machine learning system uses a co-evolution approach to obtain a rule set for predicting results after a number of cycles. The machine system uses a fitness function derived for use with the type of problem, such as a fitness function based on the sensitivity and positive predictive value of the rules. The rules are validated using the entire set of training data.

43 Citations

View as Search Results

20 Claims

1. A computer-executable method for analyzing data to predict medical outcomes, the method comprising:
- receiving data associating feature variables comprising demographic data of a plurality of patients with outcome variables corresponding to medical conditions of the plurality of patients, wherein the data comprises a first data set associated with a first outcome and a second data set associated with a second outcome, the second outcome being substantially less likely than the first outcome;
  
  identifying within the first data set a third data set that consists essentially of nearby neighbors to the second data set; and
  
  processing the first, second and third data sets to generate at least one set of computer-executable rules for predicting the likelihood of the first outcome or the second outcome.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1, additionally comprising identifying the nearby neighbors in the first data set based on a proximity of data in the first data set to the second outcome.
  - 3. The method of claim 1, additionally comprising identifying the nearby neighbors in the first data set based on a proximity of data in the first data set to the feature variables of data in the second data set.
  - 4. The method of claim 1, additionally comprising validating the at least one set of computer-executable rules using substantially all the training data.
  - 5. The method of claim 4, wherein said validating includes obtaining at least one accuracy measure for the at least one set of computer-executable rules.
  - 6. The method of claim 5, wherein obtaining the at least one accuracy measure includes obtaining at least one of a positive predictive value and a sensitivity of the at least one set of computer-executable rules.
  - 7. The method of claim 1, wherein the first outcome is at least thirty times more likely than the second outcome.
  - 8. The method of claim 1, wherein said processing comprises using a plurality of software-based, computer-executable machine learners to generate the at least one set of computer-executable rules.
  - 9. The method of claim 8, wherein said processing further comprises:
    - developing a set of interim rules using the plurality of software-based, computer-executable machine learners;
      
      evaluating the set of interim rules; and
      
      developing a revised set of interim rules based on said act of evaluating.
  - 10. The method of claim 9, wherein said evaluating the set of interim rules comprises applying a user-selectable fitness function.
  - 11. The method of claim 10, wherein the user-selectable fitness function is based on at least two of an accuracy, a sensitivity, a positive predictive value, and a correlation coefficient of the interim rules.
  - 12. The method of claim 1, wherein the first outcome is associated with a first range of medical costs less than a cost threshold, and wherein the second outcome is associated with a second range of medical costs at least as great as the cost threshold.

13. A system for using machine learning to predict a medical outcome, the system comprising:
- medical data associating feature variables comprising demographic data with outcome variables, wherein the medical data comprises a first data set associated with a first outcome and a second data set associated with a second outcome substantially less likely than the first outcome;
  
  a processing module configured to identify a first subset of the first data set consisting essentially of non-nearby neighbors to the second data set, a second subset within the first data set consisting essentially of nearby neighbors to the second data set, and a third subset of the second data set; and
  
  a plurality of machine learners configured to develop from the first, second and third subsets at least one set of computer-executable rules usable to predict the first outcome or the second outcome.
- View Dependent Claims (14, 15, 16, 17, 18)
- - 14. The system of claim 13, wherein the third subset is approximately one half the size of a combination of the first and second subsets.
  - 15. The system of claim 14, wherein the first subset comprises a random sampling of the non-nearby neighbors of the first data set to the second data set.
  - 16. The system of claim 13, wherein said processing module is configured to analyze Euclidean distances between feature variables of medical data in the first data set and feature variables of medical data in the second data set to identify the nearby neighbors to the second data set.
  - 17. The system of claim 13, wherein said nearby neighbors consist essentially of medical data with outcome variables having values within a defined distance from an outcome variable threshold.
  - 18. The system of claim 17, wherein said nearby neighbors consist essentially of medical data with outcome variables having values outside of a second defined distance from the outcome variable threshold.

19. A computer system for using machine learning to predict an outcome associated with a medical condition, the computer system comprising:
- means for storing data associating feature variables comprising demographic data of a plurality of patients with outcome variables corresponding to medical conditions of the plurality of patients, wherein the data comprises a first data set associated with a first outcome and a second data set associated with a second outcome, the second outcome being substantially less likely than the first outcome;
  
  means for identifying within the first data set a third data set that consists essentially of nearby neighbors to the second data set; and
  
  means for processing the first, second and third data sets to generate at least one set of computer-executable rules for predicting the likelihood of the first outcome or the second outcome.
- View Dependent Claims (20)
- - 20. The computer system of claim 19, wherein said means for processing comprises a plurality of software-based, computer-executable machine learners.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Medical Scientists, Inc. (Merck & Co., Inc.)
Original Assignee
Medical Scientists, Inc. (Merck & Co., Inc.)
Inventors
Chen, Hung-Han, Hunter, Lawrence, Poteat, Harry, Snow, Kristin

Application Number

US12/019,405
Publication Number

US 20080120267A1
Time in Patent Office

Days
Field of Search
US Class Current

706/61
CPC Class Codes

G06F 18/211 Selection of the most signi...

G06N 20/00 Machine learning

SYSTEMS AND METHODS FOR ANALYZING DATA TO PREDICT MEDICAL OUTCOMES

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

43 Citations

20 Claims

Specification

Use Cases

Quick Links

Others

SYSTEMS AND METHODS FOR ANALYZING DATA TO PREDICT MEDICAL OUTCOMES

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

43 Citations

20 Claims

Specification

Subscription Required

Use Cases

Quick Links

Others