System and method for confirmationally-flexible molecular identification
First Claim
1. A method for storing a representation of one or more reference molecules in a memory in a computer system, the method executed on a computer system and comprising the steps of:
- a. identifying either one or more rigid substructures of the reference molecule, each of the rigid substructures having one or more atomic sites, each of the atomic sites being connected to zero or more atomic sites in the rigid substructure with a non-rotatable bond, each rigid substructure having a global position and a global orientation in a global coordinate frame;
b. defining a vector with a magnitude and direction with a fixed position and orientation with respect to the rigid substructure;
c. selecting a set of three or more sites, the set of sites forming a frame tuple, at least one of the sites being non-colinear with the remaining sites, the sites being in a fixed position with respect to the rigid substructure, and the frame tuple defining a three-dimensional skewed local coordinate frame;
d. selecting one or more of the frame tuples and generating a frame tuple field with information associated with each of the selected frame tuples; and
e. storing a record in a data structure, the data structure having a plurality of records, each record containing the frame tuple field and a vector field, the vector field containing vector information relating to the vector as well as information about the identities of the molecule and rigid substructure having the sites forming the frame tuple.
1 Assignment
0 Petitions
Accused Products
Abstract
A reference storage process populates a data structure so that the data structure contains all of the molecular structures and/or rigid substructures in the database classified according to attributes of tuples. In a preferred embodiment, the tuples are derived from sites (e.g. atomic sites) of the molecular structures and the attributes can be derived from geometric (and other) information related to the tuples. The attributes are used to define indices in the data structure that are associated with invariant vector information (e.g. information about rotatable bond(s) in skewed local coordinate frames created from tuples). These representations are invariant with respect to the rotation and translation of molecular structures and/or the rotation of substructures about attached rotatable bond(s). Accordingly, the invariant vector information is classified in the data structure with the respective tuple attributes in locations determined by the index derived from the respective tuple. A matching process creates one or more tuples, skewed local reference frames, and indices (called test frame tuple indices) for the structure (substructures) of a test molecule using the same technique that was used to populate the data structure. The test frame tuple index accesses the invariant vector information and tallies the frequency of matching in order to determine the identity of molecules/substructures in the database that are structurally similar to the test molecule. This identification can be achieved even in the presence of conformationally flexible molecules in the database.
92 Citations
60 Claims
-
1. A method for storing a representation of one or more reference molecules in a memory in a computer system, the method executed on a computer system and comprising the steps of:
-
a. identifying either one or more rigid substructures of the reference molecule, each of the rigid substructures having one or more atomic sites, each of the atomic sites being connected to zero or more atomic sites in the rigid substructure with a non-rotatable bond, each rigid substructure having a global position and a global orientation in a global coordinate frame; b. defining a vector with a magnitude and direction with a fixed position and orientation with respect to the rigid substructure; c. selecting a set of three or more sites, the set of sites forming a frame tuple, at least one of the sites being non-colinear with the remaining sites, the sites being in a fixed position with respect to the rigid substructure, and the frame tuple defining a three-dimensional skewed local coordinate frame; d. selecting one or more of the frame tuples and generating a frame tuple field with information associated with each of the selected frame tuples; and e. storing a record in a data structure, the data structure having a plurality of records, each record containing the frame tuple field and a vector field, the vector field containing vector information relating to the vector as well as information about the identities of the molecule and rigid substructure having the sites forming the frame tuple. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
-
37. A method for storing a representation of one or more reference molecules in the memory of a computer system, the method executed on a computer system and comprising the steps of:
-
a. identifying one or more rigid substructures of the reference molecule, each of the rigid substructures having one or more atomic sites, each of the atomic sites being connected to zero or more atomic sites with a non-rotatable bond, and each rigid substructure having a global position and a global orientation in a global coordinate frame; b. defining a vector with a vector magnitude and a vector direction, the vector being fixed in position and orientation with respect to the rigid substructure; c. selecting a set of three or more sites, the sites being in a fixed position with respect to the rigid substructure and any set of sites being a frame tuple that defines a skewed local coordinate frame, the skewed local coordinate frame having two or more sides, with an angle between one or more pairs of the sides; d. selecting one or more frame tuples and generating one or more indices from information about each of the selected frame tuples; and e. storing a record in a data structure stored in the memory, the data structure having a plurality of records, and each of the records containing vector information associated with one of the indices and accessible by using the index. - View Dependent Claims (38, 39, 40, 41, 42, 43, 44)
-
-
45. A method for determining the identity of one or more reference molecules that are structurally similar to a test molecule, the method executed on a computer system and comprising the steps of:
-
a. identifying one or more rigid test substructures of the test molecule, each of the rigid test substructures having one or more atomic sites, each of the atomic sites being connected to zero or more atomic sites in the rigid test substructure with a non-rotatable bond, each rigid test substructure having a certain position and a certain orientation in a three-dimensional global reference frame; b. selecting a set of three or more test sites, the set of test sites being a test frame tuple, at least one of the test sites being non-colinear with the remaining test sites, the test sites being in a fixed position with respect to the rigid test substructure, and each of the test frame tuples defining a three-dimensional skewed local test coordinate frame; c. selecting one or more of the test frame tuples and generating a test frame tuple index from information associated with the selected test frame tuple; d. using the test frame tuple index, accessing one or more records in a data structure stored in the memory, the data structure having a plurality of records, each of the records containing a reference frame tuple field and a reference vector information field, the reference frame tuple field having a reference frame tuple index generated from a reference frame tuple defined by three or more reference sites on a rigid reference substructure of one of the reference molecules, the reference vector field having one or more entries, each entry containing reference vector information about the reference vector having a magnitude and direction and a fixed position and a fixed orientation with respect to one or more rigid reference substructures, each entry further having reference frame tuple information about the reference frame tuple, reference molecule identity information, and reference frame rigid substructure information; e. for each entry in each record with a reference frame tuple index matching the test frame tuple index, constructing a test vector for each reference vector in the skewed local test coordinate frame in order to place the test vector in the global coordinate frame; f. for each entry in each record with a reference frame tuple index matching the test frame tuple index, producing a voting record in a voting data structure, the voting record containing the reference molecule identity information, the rigid reference substructure identity information, a test vector position field with a position value and test vector orientation field with an orientation value, the position value matching the test vector positions the orientation value matching the test vector orientation, the molecule identity information matching the reference molecule identity, and the rigid reference substructure information field matching the reference substructure identity. - View Dependent Claims (46, 47, 48, 49, 50, 51, 52, 53, 54, 55)
-
-
56. A computer system for storing a representation of one or more reference molecules in a memory in the computer system and for comparing one or more of the reference molecules to a test molecule, comprising:
-
a. a database stored in the memory, the database having a representation of one or more rigid substructures of each of the reference molecules, each of the rigid substructures having one or more atomic sites, each of the atomic sites being connected to zero or more atomic sites in the rigid substructure with a non-rotatable bond, each rigid substructure having a global position and a global orientation in a global coordinate frame; b. a set of three or more sites, the set of sites being in a selected rigid substructure, the set of sites forming a frame tuple, at least one of the sites being non-colinear with the remaining sites, the sites being in a fixed position with respect to the selected rigid substructure, and the frame tuple defining a three-dimensional skewed local coordinate frame; and c. a data structure, having a plurality of records, each record containing a frame tuple field and a vector field, the vector field containing vector information relating to each of one or more vectors as well as information about the identities of one or more of the molecules and one or more of the rigid substructures, each of the vectors having a magnitude and a direction, and a fixed position and orientation with respect to the selected rigid substructure and the selected rigid substructure being one of the rigid substructures. - View Dependent Claims (57, 58, 59)
-
-
60. A computer system for storing a representation of one or more reference molecules in a memory in the computer system and for comparing one or more of the reference molecules to a test molecule, comprising:
-
a. a database means stored in the memory, the database means for representing one or more rigid substructure means of each of the reference molecules, each of the rigid substructure means having one or more atomic site means, each of the atomic site means being connected to zero or more atomic site means in the rigid substructure means with a non-rotatable bond, each rigid substructure means having a global position and a global orientation in a global coordinate frame; b. a set of three or more site means, the set of sites being in a selected rigid substructure means, the set of site means forming a frame tuple means, at least one of the site means being non-colinear with the remaining site means, the site means being in a fixed position with respect to the selected rigid substructure means, and the frame tuple means defining a three-dimensional skewed local coordinate frame means; and c. a data structure means for storing, a plurality of record means, each record means containing a frame tuple field and a vector field, the vector field containing vector information relating to each of two or more rectors as well as information about the identities of one or more of the molecules and one or more of the rigid substructure means, each of the vectors having a magnitude and a direction with a fixed position and orientation with respect to the selected rigid substructure means and the selected rigid substructure means being one of the rigid substructure means.
-
Specification