System and Method for Identifying Similar Molecules
First Claim
Patent Images
1. A method, comprising:
- constructing a vector space having dimensions determined by a plurality of chemical identifier strings, wherein the strings are determined by respective chemical compounds; and
constructing a vector for each of the strings, wherein each vector has the dimensions of the constructed vector space.
7 Assignments
0 Petitions
Accused Products
Abstract
A vectorization process is employed in which chemical identifier strings are converted into respective vectors. These vectors may then be searched to identify molecules that are identical or similar to each other. The dimensions of the vector space can be defined by sequences of symbols that make up the chemical identifier strings. The International Chemical Identifier (InChI) string defined by the International Union of Pure and Applied Chemistry (IUPAC) is particularly well suited for these methods.
-
Citations
42 Claims
-
1. A method, comprising:
-
constructing a vector space having dimensions determined by a plurality of chemical identifier strings, wherein the strings are determined by respective chemical compounds; and constructing a vector for each of the strings, wherein each vector has the dimensions of the constructed vector space. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. At least one tangible computer-useable medium, said at least one medium having a computer-readable program, wherein the program upon being processed on a computer causes the computer to implement the steps of:
-
constructing a vector space having dimensions determined by a plurality of chemical identifier strings, wherein the strings are determined by respective chemical compounds; and constructing a vector for each of the strings, wherein each vector has the dimensions of the constructed vector space.
-
-
9. A method, comprising:
-
extracting sequences of symbols from each of a plurality of chemical identifier strings, wherein each string is associated with a chemical; and defining a vector for each of the strings, the vectors having a common vector space that includes dimensions given by the extracted sequences. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. At least one tangible computer-useable medium, said at least one medium having a computer-readable program, wherein the program upon being processed on a computer causes the computer to implement the steps of:
-
extracting sequences of symbols from each of a plurality of chemical identifier strings, wherein each string is associated with a chemical; and defining a vector for each of the strings, the vectors having a common vector space that includes dimensions given by the extracted sequences.
-
-
20. A method, comprising:
-
converting chemical names to respective chemical identifier strings, the strings having a common format; constructing respective vectors from the strings; storing at least some of the vectors in at least one memory device; and searching at least some of the stored vectors to identify certain chemical structures that are at least similar to each other. - View Dependent Claims (21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
-
33. At least one tangible computer-useable medium, said at least one medium having a computer-readable program, wherein the program upon being processed on a computer causes the computer to implement the steps of:
-
converting chemical names to respective chemical identifier strings, the strings having a common format; constructing respective vectors from the strings; storing at least some of the vectors in at least one memory device; and searching at least some of the stored vectors to identify certain chemical structures that are at least similar to each other.
-
-
34. A method, comprising:
-
extracting chemical entities from different documents, the chemical entities having different formats with respect to at least one of name and chemical identifier string; representing the chemical entities as respective chemical identifier strings having a common format; constructing respective vectors from the commonly formatted chemical identifier strings; storing at least some of the vectors in at least one memory device; and searching at least some of the stored vectors. - View Dependent Claims (35, 36, 37, 38, 39)
-
-
40. At least one tangible computer-useable medium, said at least one medium having a computer-readable program, wherein the program upon being processed on a computer causes the computer to implement the steps of:
-
extracting chemical entities from different documents, the chemical entities having different formats with respect to at least one of name and chemical identifier string; representing the chemical entities as respective chemical identifier strings having a common format; constructing respective vectors from the commonly formatted chemical identifier strings; storing at least some of the vectors in at least one memory device; and searching at least some of the stored vectors. - View Dependent Claims (41, 42)
-
Specification