Field-based similarity search system and method
First Claim
1. A similarity search method, comprising:
- generating a feature database which stores data pertaining to a candidate molecule, as executed by a processor of a computer, said database comprising a hash table having entries which are generated based on;
a set of descriptors generated from conformations of fragment graphs of said candidate molecule, said fragment graphs including plural fragment nodes connected by rotatable bond edges, a specific conformation of said fragment node comprising a fragment of said candidate molecule, and two neighboring fragments connected by a rotatable bond at a specific dihedral angle comprising a fragment pair; and
a context-adapted descriptor-to-key mapping which maps said set of descriptors to a set of feature keys comprising indices that label grid cells in discriminant space;
generating scoops, descriptors and keys for a query molecule;
identifying a match between a query molecule fragment pair feature and a candidate molecule fragment pair feature by comparing said keys of said query molecule to said keys in said feature database, a correspondence comprising a scoop for said candidate molecule stored in said feature database having a same key as a scoop for said query molecule;
using said match to align a fragment pair of said candidate molecule to said query molecule by overlaying internal coordinate axes of said scoops for said candidate and query molecules, said correspondence implying an alignment of said candidate molecule and query molecule fragment pairs;
determining the number of candidate molecule fragment pairs the number of candidate molecule fragment pairs in the set, Cfp, being given by Cfp=(n−
1)·
(Cfrag)2·
Crbe, where n is the number of fragments in said candidate molecule and n−
1 is the number of rotatable bond edges connecting said n fragments, Cfrag is the number of conformations in a fragment of said fragments, and Crbe is the number of steps in which said rotatable bond edges are sampled;
assembling the entirety of the fragment pairs of said candidate molecule to form an alignment thereof onto said query molecule;
retrieving a candidate molecule fragment pair having at least a predetermined number of matching features; and
displaying a result of said retrieving said candidate molecule fragment pair.
5 Assignments
0 Petitions
Accused Products
Abstract
A similarity search method includes generating a feature database which stores data pertaining to a candidate molecule, as executed by a processor of a computer, the database including a hash table having entries which are generated based on, a set of descriptors generated from conformations of fragment graphs of the candidate molecule, the fragment graphs including plural fragment nodes connected by rotatable bond edges, a specific conformation of the fragment node including a fragment of the candidate molecule, and two neighboring fragments connected by a rotatable bond at a specific dihedral angle including a fragment pair, and a context-adapted descriptor-to-key mapping which maps the set of descriptors to a set of feature keys including indices that label grid cells in discriminant space.
-
Citations
14 Claims
-
1. A similarity search method, comprising:
-
generating a feature database which stores data pertaining to a candidate molecule, as executed by a processor of a computer, said database comprising a hash table having entries which are generated based on; a set of descriptors generated from conformations of fragment graphs of said candidate molecule, said fragment graphs including plural fragment nodes connected by rotatable bond edges, a specific conformation of said fragment node comprising a fragment of said candidate molecule, and two neighboring fragments connected by a rotatable bond at a specific dihedral angle comprising a fragment pair; and a context-adapted descriptor-to-key mapping which maps said set of descriptors to a set of feature keys comprising indices that label grid cells in discriminant space; generating scoops, descriptors and keys for a query molecule; identifying a match between a query molecule fragment pair feature and a candidate molecule fragment pair feature by comparing said keys of said query molecule to said keys in said feature database, a correspondence comprising a scoop for said candidate molecule stored in said feature database having a same key as a scoop for said query molecule; using said match to align a fragment pair of said candidate molecule to said query molecule by overlaying internal coordinate axes of said scoops for said candidate and query molecules, said correspondence implying an alignment of said candidate molecule and query molecule fragment pairs; determining the number of candidate molecule fragment pairs the number of candidate molecule fragment pairs in the set, Cfp, being given by Cfp=(n−
1)·
(Cfrag)2·
Crbe, where n is the number of fragments in said candidate molecule and n−
1 is the number of rotatable bond edges connecting said n fragments, Cfrag is the number of conformations in a fragment of said fragments, and Crbe is the number of steps in which said rotatable bond edges are sampled;assembling the entirety of the fragment pairs of said candidate molecule to form an alignment thereof onto said query molecule; retrieving a candidate molecule fragment pair having at least a predetermined number of matching features; and displaying a result of said retrieving said candidate molecule fragment pair. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A method for finding alignments of a flexible molecule to a query molecule. said method comprising:
-
representing a conformation space of said flexible molecule; generating an arbitrary description of a three-dimensional property field of said flexible molecule; characterizing parts of said flexible molecule using said arbitrary description; generating a feature database which stores data pertaining to said flexible molecule on a processor of a computer, said database comprising a hash table having entries which are generated based on; a set of descriptors generated from conformations of fragment graphs of said flexible molecule, said fragment graphs including plural fragment nodes connected by rotatable bond edges, a specific conformation of said fragment node comprising a fragment of said flexible molecule, and two neighboring fragments connected by a rotatable bond at a specific dihedral angle comprising a fragment pair; and a context-adapted descriptor-to-key mapping which maps said set of descriptors to a set of feature keys comprising indices that label grid cells in discriminant space; representing a query molecule using said arbitrary description for a comparison with molecules represented by said conformation space by generating scoops, descriptors and keys for said query molecule; identifying a match between a query molecule fragment pair feature and a flexible molecule fragment pair feature by comparing said keys of said query molecule to said keys in said feature database, a correspondence comprising a scoop for said flexible molecule stored in said feature database having a same key as a scoop for said query molecule; aligning said parts of said flexible molecule to said query molecule by overlaying internal coordinate axes of said scoops for said flexible and query molecules, said correspondence implying an alignment of said flexible molecule and query molecule fragment pairs; determining the number of said parts of said number of flexible molecule fragment pairs, the number of said parts of said number of flexible molecule fragment pairs in the set, Cfp, being given by Cfp=(n−
1)·
(Cfrag)2·
Crbe, where n is the number of fragments in said candidate molecule and n−
1 is the number of rotatable bond edges connecting said n fragments, Cfrag is the number of conformations in a fragment of said fragments, and Crbe is the number of steps in which said rotatable bond edges are sampled;assembling said parts of said flexible molecule to form an alignment thereof onto said query molecule; identifying a flexible molecule which is similar to said query molecule based on a result of said aligning said parts of said flexible and query molecules; and displaying a result of said assembling said parts.
-
-
14. A non-transitory computer-readable storage medium encoded with a computer program executable by a digital processing apparatus to perform a similarity search method, said method comprising:
-
generating a conformational space representation and an arbitrary description of a three-dimensional property field for a flexible molecule; generating a feature database which stores data pertaining to a flexible molecule, said database comprising a hash table having entries which are generated based on; a set of descriptors generated from conformations of fragment graphs of said flexible molecule, said fragment graphs including plural fragment nodes connected by rotatable bond edges, a specific conformation of said fragment node comprising a fragment of said flexible molecule, and two neighboring fragments connected by a rotatable bond at a specific dihedral angle comprising a fragment pair; and a context-adapted descriptor-to-key mapping which maps said set of descriptors to a set of feature keys comprising indices that label grid cells in discriminant space; representing a query molecule using said arbitrary description for a comparison with said conformational space representation by generating scoops, descriptors and keys for said query molecule; identifying a match between a query molecule fragment pair feature and a flexible molecule fragment pair by comparing said keys of said query molecule to said keys in said feature database, a correspondence comprising a scoop for said flexible molecule stored in said feature database having a same key as a scoop for said query molecule; aligning said parts to said query molecule and assembling said parts to form an alignment onto said query molecule by overlaying internal coordinate axes of said scoops for said flexible and query molecules, said correspondence implying an alignment of said flexible molecule and query molecule fragment pairs; determining the number of said parts of said number of flexible molecule fragment pairs, the number of said parts of said number of flexible molecule fragment pairs in the set. Cfp, being given by Cfp=(n−
1)·
(Cfrag)2·
Crbe, where n is the number of fragments in said candidate molecule and n−
1 is the number of rotatable bond edges connecting said n fragments, Cfrag is the number of conformations in a fragment of said fragments, and Crbe is the number of steps in which said rotatable bond edges are sampled;identifying a flexible molecule which is similar to said query molecule based on a result of said aligning; and displaying a result of said aligning said parts.
-
Specification