System and method for identifying similar molecules
First Claim
Patent Images
1. A method, comprising:
- converting chemical names to respective chemical identifier strings, the chemical identifier strings including symbols and having a common format, the chemical identifier strings including strings of differing numbers of symbols;
constructing respective vectors from the chemical identifier strings, the vectors having a common vector space that is based on information extracted from a plurality of the chemical identifier strings, each of the vectors being constructed by comparing information in its corresponding chemical identifier string with the vector space;
storing at least some of the vectors in at least one memory device; and
using a computer to search at least some of the stored vectors to identify certain chemical structures that are at least similar to each other.
7 Assignments
0 Petitions
Accused Products
Abstract
A vectorization process is employed in which chemical identifier strings are converted into respective vectors. These vectors may then be searched to identify molecules that are identical or similar to each other. The dimensions of the vector space can be defined by sequences of symbols that make up the chemical identifier strings. The International Chemical Identifier (InChI) string defined by the International Union of Pure and Applied Chemistry (IUPAC) is particularly well suited for these methods.
63 Citations
30 Claims
-
1. A method, comprising:
-
converting chemical names to respective chemical identifier strings, the chemical identifier strings including symbols and having a common format, the chemical identifier strings including strings of differing numbers of symbols; constructing respective vectors from the chemical identifier strings, the vectors having a common vector space that is based on information extracted from a plurality of the chemical identifier strings, each of the vectors being constructed by comparing information in its corresponding chemical identifier string with the vector space; storing at least some of the vectors in at least one memory device; and using a computer to search at least some of the stored vectors to identify certain chemical structures that are at least similar to each other. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 24, 25, 26, 27, 28, 29, 30)
-
-
14. At least one tangible, non-transitory computer-useable medium, said at least one medium having a computer-readable program, wherein the program upon being processed on a computer causes the computer to implement the steps of:
-
converting chemical names to chemical identifier strings, the chemical identifier strings including respective symbols and having a common format, the chemical identifier strings including strings of differing numbers of symbols; constructing respective vectors from the chemical identifier strings, the vectors having a common vector space that is based on information extracted from a plurality of the chemical identifier strings, each of the vectors being constructed by comparing information in its corresponding chemical identifier string with the vector space; storing at least some of the vectors in at least one memory device; and searching at least some of the stored vectors to identify certain chemical structures that are at least similar to each other.
-
-
15. A method, comprising:
-
using a computer to extract chemical entities from different documents, the chemical entities including entities having different formats with respect to at least one of name and chemical identifier string; representing the chemical entities as respective chemical identifier strings having a common format, the chemical identifier strings including strings of differing numbers of symbols; constructing respective vectors from the commonly formatted chemical identifier strings, the vectors having a common vector space that is based on information extracted from a plurality of the chemical identifier strings, each of the vectors being constructed by comparing information in its corresponding chemical identifier string with the vector space; storing at least some of the vectors in at least one memory device; and searching at least some of the stored vectors. - View Dependent Claims (16, 17, 18, 19, 20)
-
-
21. At least one tangible, non-transitory computer-useable medium, said at least one medium having a computer-readable program, wherein the program upon being processed on a computer causes the computer to implement the steps of:
-
extracting chemical entities from different documents, the chemical entities including entities having different formats with respect to at least one of name and chemical identifier string; representing the chemical entities as respective chemical identifier strings having a common format, the chemical identifier strings including strings of differing numbers of symbols; constructing respective vectors from the commonly formatted chemical identifier strings, the vectors having a common vector space that is based on information extracted from a plurality of the chemical identifier strings, each of the vectors being constructed by comparing information in its corresponding chemical identifier string with the vector space; storing at least some of the vectors in at least one memory device; and searching at least some of the stored vectors. - View Dependent Claims (22, 23)
-
Specification