Method, system, and computer program product for representing similarity/dissimilarity between chemical compounds
First Claim
1. A method for representing similarity/dissimilarity between compounds, comprising the steps of:
- (1) generating an initial configuration of objects for display on a graphics device, the objects representing a plurality of compounds, each object having an initial set of coordinates;
(2) selecting a subset of compounds from the plurality of compounds;
(3) refining the coordinates of at least one object that represents a selected compound based on the coordinates of the at least one object, the coordinates of a second object, and a distance between the at least one and second objects so that a distance between the refined coordinates of the at least one object and the coordinates of the second object is more representative of the similarity/dissimilarity of the compounds;
(4) repeating steps (2) and (3) for additional subsets of compounds; and
(5) displaying at least some of the objects on the graphics device.
2 Assignments
0 Petitions
Accused Products
Abstract
A system method, and computer program product for visualizing and interactively analyzing data relating to chemical compounds. A user selects a plurality of compounds to map, and also selects a method for evaluating similarity/dissimilarity between the selected compounds. A non-linear map is generated in accordance with the selected compounds and the selected method. The non-linear map has a point for each of the selected compounds, wherein a distance between any two points is representative of similarity/dissimilarity between the corresponding compounds. A portion of the non-linear map is then displayed. Users are enabled to interactively analyze compounds represented in the non-linear map.
170 Citations
42 Claims
-
1. A method for representing similarity/dissimilarity between compounds, comprising the steps of:
-
(1) generating an initial configuration of objects for display on a graphics device, the objects representing a plurality of compounds, each object having an initial set of coordinates;
(2) selecting a subset of compounds from the plurality of compounds;
(3) refining the coordinates of at least one object that represents a selected compound based on the coordinates of the at least one object, the coordinates of a second object, and a distance between the at least one and second objects so that a distance between the refined coordinates of the at least one object and the coordinates of the second object is more representative of the similarity/dissimilarity of the compounds;
(4) repeating steps (2) and (3) for additional subsets of compounds; and
(5) displaying at least some of the objects on the graphics device. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34)
(a) performing principal component analysis of data representing the plurality of compounds; and
(b) generating an initial configuration of objects using the principal components that account for a significant portion of the variance in the data.
-
-
7. The method of claim 1, wherein step (2) comprises the step of selecting two compounds from the plurality of compounds.
-
8. The method of claim 7, wherein step (2) comprises the step of selecting two compounds with a probability of selection that is proportional to the separation distance of the objects so that objects located closer to each other are selected more frequently than objects located farther from each other.
-
9. The method of claim 7, wherein step (2) comprises the step of selecting two compounds with a probability that is proportional to the identity of the compounds so that certain predefined compounds are sampled more frequently than other compounds.
-
10. The method of claim 7, wherein step (2) comprises the step of selecting the subset of compounds based on one or more densities of their objects.
-
11. The method of claim 10, wherein step (2) further comprises the step of selecting compounds in higher density areas more often than objects in lower density areas.
-
12. The method of claim 10, wherein step (2) further comprises the step of selecting two compounds with a probability of selection that is proportional to the local density of the objects so that objects in low-density areas are selected with a lower probability than objects in high-density areas.
-
13. The method of claim 10, wherein step (2) further comprises the step of selecting two compounds with a probability of selection that is proportional to the local density of the objects so that objects in low-density areas are selected with a higher probability than objects in high-density areas.
-
14. The method of claim 7, wherein step (2) comprises the step of selecting two compounds with a probability of selection that is proportional to the separation distance of the objects so that objects located closer to each other are sampled less frequently than objects located farther from each other.
-
15. The method of claim 1, wherein step (2) comprises the step of selecting three compounds from the plurality of compounds.
-
16. The method of claim 1, wherein step (2) comprises the step of selecting the subset of compounds from the plurality of compounds at random.
-
17. The method of claim 1, wherein step (2) comprises the step of selecting the subset of compounds from the plurality of compounds using a semi-systematic procedure.
-
18. The method of claim 1, wherein step (2) comprises the step of selecting the subset of compounds from the plurality of compounds using a systematic procedure.
-
19. The method of claim 1, wherein step (3) comprises the step of:
(a) refining the coordinates of at least one object based on molecular properties of the compounds selected in step (2) and a predetermined method for evaluating molecular similarity/dissimilarity.
-
20. The method of claim 19, wherein step (3)(a) comprises the step of refining the coordinates of at least one object based on the molecular shapes of the compounds selected in step (2).
-
21. The method of claim 19, wherein step (3)(a) comprises the step of refining the coordinates of at least one object based on electronic fields of the compounds selected in step (2).
-
22. The method of claim 19, wherein step (3)(a) comprises the step of refining the coordinates of at least one object based on steric fields of the compounds selected in step (2).
-
23. The method of claim 19, wherein step (3)(a) comprises the step of refining the coordinates of at least one object based on a user defined similarity/dissimilarity method.
-
24. The method of claim 19, wherein step (3)(a) comprises the step of refining the coordinates of at least one object based a plurality of molecular properties modified by weighting factors.
-
25. The method of claim 19, wherein step (3)(a) compises the step of refining the coordinates of at least one object based a plurality of molecular properties that are represented by a binary number.
-
26. The method of claim 1, wherein step (3) comprises the step of refining the coordinates of at least one object based on a user'"'"'s input regarding the similarity/dissimilarity of the compounds selected in step (2).
-
27. The method of claim 1, wherein step (4) comprises the step of repeating steps (2) and (3) a predefined number of times.
-
28. The method of claim 1, wherein step (4) comprises the steps of:
-
(a) selecting an error criterion; and
(b) repeating steps (2) and (3) until the error criterion reaches a predefined threshold.
-
-
29. The method of claim 1, wherein step (4) comprises the step of repeating steps (2) and (3) until a predefined time limit has been exceeded.
-
30. The method of claim 1, further comprising the step of:
(6) receiving input from a user regarding which compounds to represents by objects.
-
31. The method of claim 30, further comprising the step of:
(7) receiving input from a user regarding which method to use for evaluating the similarity/dissimilarity of compounds.
-
32. The method of claim 31, further comprising the step of:
(8) receiving input from a user regarding a dimensionality to be used in creating a non-linear map.
-
33. The method of claim 32, further comprising the step of:
(9) generating the non-linear map based on the input received in steps (6) through (8).
-
34. The method of claim 33, further comprising the step of:
(10) receiving input from a user about a compound selected for further evaluation.
-
35. A method for representing similarity/dissimilarity between compounds, comprising the steps of:
-
(1) generating an initial configuration of objects for output, the objects representing a plurality of compounds, each object having an initial set of coordinates;
(2) selecting a subset of compounds from the plurality of compounds;
(3) refining the coordinates of at least one object that represents a selected compound based on the coordinates of the at least one object, the coordinates of a second object, and a distance between the at least one and second objects so that a distance between the refined coordinates of the at least one object and the coordinates of the second object is more representative of the similarity/dissimilarity of the compounds;
(4) repeating steps (2) and (3) for additional subsets of compounds; and
(5) outputting the coordinates of at least some of the objects. - View Dependent Claims (36, 37, 38, 39, 40)
(6) generating structure-activity conelations for compounds using the coordinates output in step (5).
-
-
40. The method of claim 35, further comprising the step of:
(6) generating structure-property correlations for compounds using the coordinates output in step (5).
-
41. A system for representing similarity/dissimilarity between compounds, comprising:
-
a module that generates an initial configuration of objects for display on a graphics device, the objects representing a plurality of compounds, each object having an initial set of coordinates;
a module that selects a subset of compounds from the plurality of compounds;
a module that refines the coordinates of at least one object that represents a selected compound based on the coordinates of the at least one object, the coordinates of a second object, and a distance between the at least one and second objects so that a distance between the refined coordinates of the at least one object and the coordinates of the second object is more representative of the similarity/dissimilarity of the compounds; and
a module that displays at least some of the objects on the graphics device.
-
-
42. A computer program product for representing similarity/dissimilarity between compounds, comprising a computer useable medium having computer program logic stored therein, wherein the computer program logic comprises:
-
means for enabling a computer to generate an initial configuration of objects for display on a graphics device, the objects representing a plurality of compounds, each object having an initial set of coordinates;
means for enabling a computer to select a subset of compounds from the plurality of compounds;
means for enabling a computer to refine the coordinates of at least one object that represents a selected compound based on the coordinates of the at least one object, the coordinates of a second object, and a distance between the at least one and second objects so that a distance between the refined coordinates of the at least one object and the coordinates of the second object is more representative of the similarity/dissimilarity of the compounds; and
means for enabling a computer to display at least some of the objects on the graphics device.
-
Specification