Text influenced molecular indexing system and computer-implemented and/or computer-assisted method for same
First Claim
1. A method of calculating similarity or substantial similarity between a first chemical descriptor and at least one other chemical descriptor in a matrix representing a plurality of chemical and textual descriptors, comprising the steps of:
- (a) creating at least one chemical descriptor and at least one textual descriptor for each compound in a collection of compounds;
(b) preparing a descriptor matrix X, wherein the descriptor matrix comprises;
a plurality of columns, each column representing a text source containing textual and chemical descriptions, and;
a plurality of rows, each row comprising a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the number of times a descriptor occurs in each respective text source;
(c) performing a singular value decomposition (SVD) of the descriptor matrix to produce resultant matrices;
(d) using at least one of the resultant matrices to compute the similarity between the first chemical descriptor di and the at least one other chemical descriptor dj; and
e) outputting at least a subset of the at least one other chemical descriptor ranked in order of similarity to the first chemical descriptor.
2 Assignments
0 Petitions
Accused Products
Abstract
An extension of the vector space model for computing chemical similarity using textual and chemical descriptors is described. The method uses a chemical and/or textual description of a molecule/chemical and a decomposes a molecule/chemical descriptor matrix by a suitable technique such as singular value decomposition to create a low dimensional representation of the original descriptor space. Similarities between a user probe and the textual and/or chemical descriptors are then computed and ranked.
-
Citations
58 Claims
-
1. A method of calculating similarity or substantial similarity between a first chemical descriptor and at least one other chemical descriptor in a matrix representing a plurality of chemical and textual descriptors, comprising the steps of:
-
(a) creating at least one chemical descriptor and at least one textual descriptor for each compound in a collection of compounds;
(b) preparing a descriptor matrix X, wherein the descriptor matrix comprises;
a plurality of columns, each column representing a text source containing textual and chemical descriptions, and;
a plurality of rows, each row comprising a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the number of times a descriptor occurs in each respective text source;
(c) performing a singular value decomposition (SVD) of the descriptor matrix to produce resultant matrices;
(d) using at least one of the resultant matrices to compute the similarity between the first chemical descriptor di and the at least one other chemical descriptor dj; and
e) outputting at least a subset of the at least one other chemical descriptor ranked in order of similarity to the first chemical descriptor. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A method of calculating similarity or substantial similarity between a first document Vi and at least one other document Vj in a matrix representing a plurality of chemical and textual descriptors, comprising the steps of:
-
(a) creating at least one chemical descriptor and at least one text descriptor for each compound in each document;
(b) preparing a descriptor matrix X, wherein the descriptor matrix comprises;
a plurality of columns, each column representing a text source containing textual and chemical descriptions, and;
a plurality of rows, each row comprising a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the number of times a descriptor occurs in each respective text source;
(c) performing a singular value decomposition (SVD) of the descriptor matrix to produce resultant matrices;
(d) using at least one of the resultant matrices to compute the similarity between the first document and the at least one other document; and
e) outputting at least a subset of the at least one other document ranked in order of similarity to the first document. - View Dependent Claims (9, 10, 11, 12, 13, 14, 16, 17, 18, 19, 20, 21)
-
-
15. A method of calculating similarity or substantial similarity between a chemical descriptor dj and at least one document Vi in a matrix representing a plurality of chemical and textual descriptors, comprising the steps of:
-
(a) creating at least one chemical descriptor and at least one text descriptor for each compound in each document;
(b) preparing a descriptor matrix X, wherein the descriptor matrix comprises;
a plurality of columns, each column representing a text source containing textual and chemical descriptions, and;
a plurality of rows, each row comprising a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the number of times a descriptor occurs in each respective text source;
(c) performing a singular value decomposition (SVD) of the descriptor matrix to produce resultant matrices;
(d) using at least one of the resultant matrices to compute the similarity between at least one of the at least one document Vi and chemical descriptor dj; and
e) outputting at least a subset of the at least one document ranked in order of similarity to the chemical descriptor.
-
-
22. A method of calculating similarity or substantial similarity between a textual descriptor dj and at least one document Vi in a matrix representing a plurality of chemical and textual descriptors, comprising the steps of:
-
(a) creating at least one chemical descriptor and at least one textual descriptor for each compound in each document;
(b) preparing a descriptor matrix X, wherein the descriptor matrix comprises;
a plurality of columns, each column representing a text source containing textual and chemical descriptions, and;
a plurality of rows, each row comprising a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the number of times a descriptor occurs each respective text source;
(c) performing a singular value decomposition (SVD) of the descriptor matrix to produce resultant matrices;
(d) using at least one of the resultant matrices to compute the similarity between at least one of the at least one document Vi and textual descriptor dj; and
e) outputting at least a subset of the at least one document ranked in order of similarity to the chemical descriptor. - View Dependent Claims (23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 37, 38, 39, 40, 41, 42)
-
-
29. A computer readable medium including instructions being executable by a computer, the instructions instructing the computer to generate a searchable representation of chemical structures, the instructions comprising:
-
(a) creating at least one chemical descriptor and at least one text descriptor for each compound in a collection of compounds;
(b) preparing a descriptor matrix X, wherein the descriptor matrix comprises a plurality of columns, each column representing a text source containing textual and chemical descriptions, and;
a plurality of rows, each row comprising a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the number of times a descriptor occurs in each respective text source;
(c) performing singular value decomposition (SVD) of the descriptor matrix to produce resultant matrices;
(d) using at least one of the resultant matrices to compute the similarity between the first chemical descriptor di and the at least one other chemical descriptor dj; and
e) outputting at least a subset of the at least one other chemical descriptor ranked in order of similarity to the first chemical descriptor.
-
-
36. A computer readable medium for calculating the similarity between a first text source and at least one other text source in a matrix comprising a plurality of chemical and textual descriptors, comprising the steps of:
-
(a) creating at least one chemical descriptor and at least one text descriptor for each compound in each text source;
(b) preparing a descriptor matrix X, wherein the descriptor matrix comprises;
a plurality of columns, each column representing a text source containing textual and chemical descriptions, and;
a plurality of rows, each row comprising a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the number of times a descriptor occurs in each respective text source;
(c) performing a singular value decomposition (SVD) of the descriptor matrix to produce resultant matrices;
(d) using at least one of the resultant matrices to compute the similarity between the first text source Vi and the at least one other test source Vj; and
e) outputting at least a subset of the at least one other test source ranked in order of similarity to the first text source.
-
-
43. A computer readable medium for calculating the similarity between a chemical descriptor dj and at least one text source Vi and, in a matrix comprising a plurality of chemical and textual descriptors, comprising the steps of:
-
(a) creating at least one chemical descriptor and at least one text descriptor for each compound in each text source;
(b) preparing a descriptor matrix X, wherein the descriptor matrix comprises;
a plurality of columns, each column representing a text source containing textual and chemical descriptions, and;
a plurality of rows, each row comprising a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the number of times a descriptor occurs in a text source;
(c) performing a singular value decomposition (SVD) of the descriptor matrix to produce resultant matrices;
(d) using at least one of the resultant matrices to compute the similarity between at least one of the at least one text source Vi and chemical descriptor dj; and
e) outputting at least a subset of the at least one text source ranked in order of similarity to the chemical descriptor. - View Dependent Claims (44, 45, 46, 47, 48, 49)
-
-
50. A computer readable medium for calculating the similarity between a textual descriptor dj and at least one text source Vi in a matrix comprising a plurality of chemical and textual descriptors, comprising the steps of:
-
(a) creating at least one chemical descriptor and at least one textual descriptor for each compound in each text source;
(b) preparing a descriptor matrix X, wherein the descriptor matrix comprises a plurality of columns, each column representing a test source containing textual and chemical descriptions, and;
a plurality of rows, each row comprising a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the number of times a descriptor occurs in a text source;
(c) performing a singular value decomposition (SVD) of the descriptor matrix to produce resultant matrices;
(d) using at least one of the resultant matrices to compute the similarity between at least one of the at least one text source Vi and textual descriptor dj and e) outputting at least a subset of the at least one text source ranked in order of similarity to the chemical descriptor. - View Dependent Claims (51, 52, 53, 54, 55, 56)
-
-
57. A method of calculating similarity or substantial similarity between a first chemical descriptor and at least one other chemical descriptor in a matrix representing a plurality of chemical and textual descriptors, comprising the steps of:
-
(a) creating at least one chemical descriptor and at least one textual descriptor for each compound in a collection of compounds;
(b) preparing a descriptor matrix X, wherein the descriptor matrix comprises;
a text source containing textual and chemical descriptions, and;
a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the relevancy of a descriptor with respect to a text source;
(c) performing a singular value decomposition (SVD) of the descriptor matrix to produce resultant matrices;
(d) using at least one of the resultant matrices to compute the similarity between the first chemical descriptor di and the at least one other chemical descriptor dj; and
e) outputting at least a subset of the at least one other chemical descriptor ranked in order of similarity to the first chemical descriptor.
-
-
58. A method of calculating similarity or substantial similarity between a first chemical descriptor and at least one other chemical descriptor in a matrix representing a plurality of chemical and textual descriptors, comprising the steps of:
-
(a) creating at least one chemical descriptor and at least one textual descriptor for each compound in a collection of compounds;
(b) preparing a descriptor matrix X, wherein the descriptor matrix comprises;
a text source containing textual and chemical descriptions, and;
a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the relevancy of a descriptor with respect to a text source;
(c) performing a decomposition operation on the descriptor matrix to produce resultant matrices;
(d) using at least one of the resultant matrices to compute the similarity between the first chemical descriptor di and the at least one other chemical descriptor dj; and
e) outputting at least a subset of the at least one other chemical descriptor ranked in order of similarity to the first chemical descriptor.
-
Specification