Methods for molecular property modeling using virtual data
First Claim
1. A method for generating a set of training data used to train a molecular properties model, comprising:
- selecting virtual molecules, wherein the virtual molecules are generated using a software application configured to generate representations of physically possible molecules;
assigning the virtual molecules a value for a property of interest being modeled, wherein the property of interest comprises an empirically measurable property, and wherein at least one virtual molecule is assigned an assumed value for the property of interest; and
forming the set of training data from the selected virtual molecules and assigned values for the property of interest.
3 Assignments
0 Petitions
Accused Products
Abstract
Embodiments of the invention provide methods, systems, and articles of manufacture for modeling molecular properties based on information obtained from sources other than direct empirical measurements of the properties. Embodiments of the invention use “virtual data” related to molecular properties to train a molecular properties model. Virtual data about a molecule may include real-valued data (e.g. measurement values falling along a continuous range) or a positive or negative assertion about whether a molecule exhibits a property of interest. Virtual data may be generated using a variety of techniques and may be further characterized by confidence in the accuracy of the virtual data. In addition to virtual data, embodiments of the invention may use “virtual molecules” paired with “virtual data” to train a molecular properties model. The virtual molecules may themselves be generated in a variety of ways.
-
Citations
39 Claims
-
1. A method for generating a set of training data used to train a molecular properties model, comprising:
-
selecting virtual molecules, wherein the virtual molecules are generated using a software application configured to generate representations of physically possible molecules;
assigning the virtual molecules a value for a property of interest being modeled, wherein the property of interest comprises an empirically measurable property, and wherein at least one virtual molecule is assigned an assumed value for the property of interest; and
forming the set of training data from the selected virtual molecules and assigned values for the property of interest. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A method of generating training data used to train a molecular properties model, the method comprising:
-
selecting virtual molecules, wherein the virtual molecules are generated using a software application configured to generate representations of physically possible molecules;
assigning the virtual molecules a value for a property of interest being modeled, wherein the property of interest comprises an empirically measurable property, and wherein at least one virtual molecule is assigned an assumed value for the property of interest; and
forming the set of training data from the selected virtual molecules and assigned values for the property of interest;
generating a representation of the molecules included in the set of training data in a form appropriate for a second software application, wherein the second software application is configured to perform a machine learning algorithm using the set of training data; and
providing the set of training data to the second software application, performing the machine learning algorithm, thereby generating the molecular properties model;
selecting a test molecule;
generating a representation of the test molecule appropriate for the molecular properties model; and
providing the representation of the test molecule to the molecular properties model; and
generating a prediction about the property of interest for the test molecule.
-
-
13. A method for generating a set of training data used to train a molecular properties model, comprising:
-
selecting molecules;
assigning the molecules a value for the property of interest being modeled, wherein the property of interest comprises an empirically measurable property, and wherein at least one molecule is assigned an assumed value for the property of interest;
forming the set of training data from the selected molecules and assigned values for the property of interest. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A computer-readable medium containing an executable component that, when executed by a processor, performs operations comprising:
-
selecting virtual molecules, wherein the virtual molecules are generated using a software application configured to generate representations of physically possible molecules;
assigning the molecules a value for the property of interest being modeled, wherein the property of interest comprises an empirically measurable property, and, wherein at least one virtual molecule is assigned an assumed value for the property of interest; and
forming the set of training data from the selected virtual molecules and assigned values for the property of interest. - View Dependent Claims (29, 30, 31, 32, 33, 34)
-
-
35. A computer-readable medium containing an executable component that, when executed by a processor, performs operations comprising:
-
selecting molecules;
assigning the molecules a value for the property of interest being modeled, wherein the property of interest comprises an empirically measurable property, and wherein at least one molecule is assigned an assumed value for the property of interest;
forming the set of training data from the selected molecules and assigned values for the property of interest. - View Dependent Claims (36, 37, 38)
-
-
39. A method for evaluating a prediction about a molecule, generated by a molecular properties model, comprising:
-
receiving the prediction for a test molecule generated by the molecular properties model, wherein the molecular properties model is trained using a set of training data, and wherein the training data comprises;
molecules generated using a first software application configured to generate representations of physically possible molecules; and
a value for a property of interest assigned to each molecule, wherein at least one molecule is assigned an assumed value for the property of interest, determining the accuracy of the prediction for the test molecule by carrying out experimentation using physically existing samples of the test molecule.
-
Specification