Estimating the accuracy of molecular property models and predictions
First Claim
Patent Images
1. A method for estimating an accuracy of a molecular properties model comprising:
- selecting a dataset, wherein the dataset includes at least one molecule description in a form appropriate for the molecular properties model and a value for a molecular property;
providing the dataset to the molecular properties model to obtain a prediction for each molecule represented by a molecule description in the dataset; and
estimating a confidence interval or bound on the accuracy of the molecular properties model in generating a prediction for a test molecule, based on the obtained predictions, relative to a selected measure of performance, wherein the molecular properties model generates predictions related to a property of interest selected from at least one of a physiological activity, pharmacokinetic property, pharmacodynamic property, physiological or pharmacological activity, toxicity or selectivity;
a chemical property including reactivity, binding affinity, pKa, or a property of a specific atom or bond in a molecule;
or a physical property including melting point, solubility, a membrane permeability, or a force-field parameter.
8 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the invention provide methods for evaluating the accuracy of a molecular model properties model (or predictions generated using a molecular properties model). The accuracy of a molecular properties model may be evaluated using three general approaches, (i) by using the same data set to both train the model and to estimate the accuracy of the model, (ii) by using distinct data sets to train and subsequently test a model, and (iii) by using multiple models (or sets of predictions).
16 Citations
33 Claims
-
1. A method for estimating an accuracy of a molecular properties model comprising:
- selecting a dataset, wherein the dataset includes at least one molecule description in a form appropriate for the molecular properties model and a value for a molecular property;
providing the dataset to the molecular properties model to obtain a prediction for each molecule represented by a molecule description in the dataset; and
estimating a confidence interval or bound on the accuracy of the molecular properties model in generating a prediction for a test molecule, based on the obtained predictions, relative to a selected measure of performance, wherein the molecular properties model generates predictions related to a property of interest selected from at least one of a physiological activity, pharmacokinetic property, pharmacodynamic property, physiological or pharmacological activity, toxicity or selectivity;
a chemical property including reactivity, binding affinity, pKa, or a property of a specific atom or bond in a molecule;
or a physical property including melting point, solubility, a membrane permeability, or a force-field parameter. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
- selecting a dataset, wherein the dataset includes at least one molecule description in a form appropriate for the molecular properties model and a value for a molecular property;
-
11. A method for estimating the accuracy of a first molecular properties model trained using a first training dataset, comprising:
-
generating a plurality of molecular properties models by repeating; (i) modifying the first training dataset to generate a modified training dataset; (ii) generating a second molecular properties model corresponding to the modified training dataset by performing a selected machine learning algorithm using the modified dataset; (iii) modifying the first training dataset to provide a test dataset to the second molecular properties model; and (iv) obtaining predictions for molecules, each represented by a molecule description, included in the test dataset; (v) estimating the accuracy of the second molecular properties model based on the predictions, relative to a selected measure of performance; and estimating a confidence interval or bound on the accuracy of the first molecular properties model in generating a prediction for a test molecule, relative to the selected measure of performance, using the estimates of the accuracy of the plurality of molecular properties models. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23)
-
-
24. A computer-readable medium containing a program which, when executed by a processor, performs operations comprising:
- receiving a dataset, wherein the dataset includes at least one molecule description in a form appropriate for the molecular properties model and a value for a molecular property;
providing the dataset to the molecular properties model to obtain a prediction for each molecule represented by a molecule description in the dataset; and
estimating a confidence interval or bound on the accuracy of the molecular properties model in generating a prediction for a test molecule, based on the obtained predictions, relative to a selected measure of performance, wherein the molecular properties model generates predictions related to a property of interest selected from at least one of a physiological activity, pharmacokinetic property, pharmacodynamic property, physiological or pharmacological activity, toxicity or selectivity;
a chemical property including reactivity, binding affinity, pKa, or a property of a specific atom or bond in a molecule;
or a physical property including melting point, solubility, a membrane permeability, or a force-field parameter. - View Dependent Claims (25, 26, 27, 28, 29)
- receiving a dataset, wherein the dataset includes at least one molecule description in a form appropriate for the molecular properties model and a value for a molecular property;
-
30. A computer-readable medium containing a program which, when executed by a processor, performs operations for estimating the accuracy of a first molecular properties model trained using a first training dataset comprising
generating a plurality of molecular properties models by repeating: -
(i) modifying the first training dataset to generate a modified training dataset; (ii) generating a second molecular properties model corresponding to the modified training dataset by performing a selected machine learning algorithm using the modified dataset; (iii) modifying the first training dataset to provide a test dataset to the second molecular properties model; and (iv) obtaining predictions for molecules, each represented by a molecule description, included in the test dataset; (v) estimating the accuracy of the second molecular properties model based on the predictions, relative to a selected measure of performance; and estimating a confidence interval or bound on the accuracy of the first molecular properties model in generating a prediction for a test molecule, relative to the selected measure of performance, using the estimates of the accuracy of the plurality of molecular properties models. - View Dependent Claims (31, 32, 33)
-
Specification