Text influenced molecular indexing system and computer-implemented and/or computer-assisted method for same

US 6,332,138 B1
Filed: 07/24/2000
Issued: 12/18/2001
Est. Priority Date: 07/23/1999
Status: Expired due to Fees

First Claim

Patent Images

1. A method of calculating similarity or substantial similarity between a first chemical descriptor and at least one other chemical descriptor in a matrix representing a plurality of chemical and textual descriptors, comprising the steps of:

(a) creating at least one chemical descriptor and at least one textual descriptor for each compound in a collection of compounds;

(b) preparing a descriptor matrix X, wherein the descriptor matrix comprises;

a plurality of columns, each column representing a text source containing textual and chemical descriptions, and;

a plurality of rows, each row comprising a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the number of times a descriptor occurs in each respective text source;

(c) performing a singular value decomposition (SVD) of the descriptor matrix to produce resultant matrices;

(d) using at least one of the resultant matrices to compute the similarity between the first chemical descriptor d_iand the at least one other chemical descriptor d_j; and

(e) outputting at least a subset of the at least one other chemical descriptor ranked in order of similarity to the first chemical descriptor.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

An extension of the vector space model for computing chemical similarity using textual and chemical descriptors is described. The method uses a chemical and/or textual description of a molecule/chemical and a decomposes a molecule/chemical descriptor matrix by a suitable technique such as singular value decomposition to create a low dimensional representation of the original descriptor space. Similarities between a user probe and the textual and/or chemical descriptors are then computed and ranked.

37 Citations

58 Claims

1. A method of calculating similarity or substantial similarity between a first chemical descriptor and at least one other chemical descriptor in a matrix representing a plurality of chemical and textual descriptors, comprising the steps of:
- (a) creating at least one chemical descriptor and at least one textual descriptor for each compound in a collection of compounds;
  
  (b) preparing a descriptor matrix X, wherein the descriptor matrix comprises;
  
  a plurality of columns, each column representing a text source containing textual and chemical descriptions, and;
  
  a plurality of rows, each row comprising a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the number of times a descriptor occurs in each respective text source;
  
  (c) performing a singular value decomposition (SVD) of the descriptor matrix to produce resultant matrices;
  
  (d) using at least one of the resultant matrices to compute the similarity between the first chemical descriptor d_iand the at least one other chemical descriptor d_j; and
  
  (e) outputting at least a subset of the at least one other chemical descriptor ranked in order of similarity to the first chemical descriptor.
- View Dependent Claims (2, 3, 4, 5, 6, 7)
- - 2. The method as recited in claim 1, wherein said creating step includes generating atom pair and topological torsion descriptors from chemical connection tables of the collection of compounds.
  - 3. The method as recited in claim 1, wherein said creating step includes creating an index of descriptors and an index of compounds in the collection.
  - 4. The method as recited in claim 1 wherein said performing step comprises the step of:
5. The method as recited in claim 4 wherein said computing step comprises the step of computing the dot product between the i^thand j^throws of the matrix PΣ
- .
6. The method as recited in claim 1 wherein the first chemical descriptor is initially an ad hoc query vector q, further comprising the step of:
- determining a matrix X_k, wherein X_kis the matrix of rank k which is equivalent to P_kΣ
  
  _kQ^T_k, and is the least squares closest to X; and
  
  projecting the ad hoc query vector onto X_k.
7. The method as recited in claim 6 wherein the ad hoc query vector q is defined as being equal to q^TPΣ
- ^−
  
  1_k.

8. A method of calculating similarity or substantial similarity between a first document V_iand at least one other document V_jin a matrix representing a plurality of chemical and textual descriptors, comprising the steps of:
- (a) creating at least one chemical descriptor and at least one text descriptor for each compound in each document;
  
  (b) preparing a descriptor matrix X, wherein the descriptor matrix comprises;
  
  a plurality of columns, each column representing a text source containing textual and chemical descriptions, and;
  
  a plurality of rows, each row comprising a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the number of times a descriptor occurs in each respective text source;
  
  (c) performing a singular value decomposition (SVD) of the descriptor matrix to produce resultant matrices;
  
  (d) using at least one of the resultant matrices to compute the similarity between the first document and the at least one other document; and
  
  e) outputting at least a subset of the at least one other document ranked in order of similarity to the first document.
- View Dependent Claims (9, 10, 11, 12, 13, 14)
- - 9. The method as recited in claim 8, wherein said creating step includes generating atom pair and topological torsion descriptors from chemical connection tables of the collection of compounds.
  - 10. The method as recited in claim 8, wherein said creating step includes creating an index of descriptors and an index of compounds in the collection.
  - 11. The method as recited in claim 8 wherein said performing step comprises the step of:
12. The method as recited in claim 11 wherein said computing step comprises the step of computing the dot product between the i^thand j^throws of the matrix QΣ
- .
13. The method as recited in claim 8 wherein the first document is initially an ad hoc query vector q, further comprising the step of:
- determining a matrix X_k, wherein X_kis the matrix of rank k which is equivalent to P_kΣ
  
  _kQ^T_k, and is the least squares closest to X; and
  
  projecting the ad hoc query vector onto X_k.
14. The method as recited in claim 13 wherein the ad hoc query vector q is defined as being equal to q^TPΣ
- ¹_k.

15. A method of calculating similarity or substantial similarity between a chemical descriptor d_jand at least one document V_iin a matrix representing a plurality of chemical and textual descriptors, comprising the steps of:
- (a) creating at least one chemical descriptor and at least one text descriptor for each compound in each document;
  
  (b) preparing a descriptor matrix X, wherein the descriptor matrix comprises;
  
  a plurality of columns, each column representing a text source containing textual and chemical descriptions, and;
  
  a plurality of rows, each row comprising a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the number of times a descriptor occurs in each respective text source;
  
  (c) performing a singular value decomposition (SVD) of the descriptor matrix to produce resultant matrices;
  
  (d) using at least one of the resultant matrices to compute the similarity between at least one of the at least one document V_iand chemical descriptor d_j; and
  
  e) outputting at least a subset of the at least one document ranked in order of similarity to the chemical descriptor.
- View Dependent Claims (16, 17, 18, 19, 20, 21)
- - 16. The method as recited in claim 15, wherein said creating step includes generating atom pair and topological torsion descriptors from chemical connection tables of the collection of compounds.
  - 17. The method as recited in claim 15, wherein said creating step includes creating an index of descriptors and an index of compounds in the collection.
  - 18. The method as recited in claim 15 wherein said performing step comprises the step of:
19. The method as recited in claim 18 wherein said computing step comprises the step of computing the dot product between the i^throw of the matrix PΣ
- and the j^throw of the matrix QΣ
  
  .
20. The method as recited in claim 15 wherein the chemical descriptor is initially an ad hoc query vector q, further comprising the step of:
- determining a matrix X_k, wherein X_kis the matrix of rank k which is equivalent to P_kΣ
  
  _kQ^T_k, and is the least squares closest to X; and
  
  projecting the ad hoc query vector onto X_k.
21. The method as recited in claim 20 wherein the ad hoc query vector q is defined as being equal to q^TPΣ
- ¹_k.

22. A method of calculating similarity or substantial similarity between a textual descriptor d_jand at least one document V_iin a matrix representing a plurality of chemical and textual descriptors, comprising the steps of:
- (a) creating at least one chemical descriptor and at least one textual descriptor for each compound in each document;
  
  (b) preparing a descriptor matrix X, wherein the descriptor matrix comprises;
  
  a plurality of columns, each column representing a text source containing textual and chemical descriptions, and;
  
  a plurality of rows, each row comprising a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the number of times a descriptor occurs each respective text source;
  
  (c) performing a singular value decomposition (SVD) of the descriptor matrix to produce resultant matrices;
  
  (d) using at least one of the resultant matrices to compute the similarity between at least one of the at least one document V_iand textual descriptor d_j; and
  
  e) outputting at least a subset of the at least one document ranked in order of similarity to the chemical descriptor.
- View Dependent Claims (23, 24, 25, 26, 27, 28)
- - 23. The method as recited in claim 22, wherein said creating step includes generating atom pair and topological torsion descriptors from chemical connection tables of the collection of compounds.
  - 24. The method as recited in claim 22, wherein said creating step includes creating an index of descriptors and an index of compounds in the collection.
  - 25. The method as recited in claim 22 wherein said performing step comprises the step of:
26. The method as recited in claim 25 wherein said computing step comprises the step of computing the dot product between the i^throw of the matrix PΣ
- and the j^throw of the matrix QΣ
  
  .
27. The method as recited in claim 22 wherein the textual descriptor d_jis initially an ad hoc query vector q, further comprising the step of:
- determining a matrix X_k, wherein X_kis the matrix of rank k which is equivalent to P_kΣ
  
  _kQ^T_k, and is the least squares closest to X; and
  
  projecting the ad hoc query vector onto X_k.
28. The method as recited in claim 27 wherein the ad hoc query vector q is defined as being equal to q^TPΣ
- ¹_k.

29. A computer readable medium including instructions being executable by a computer, the instructions instructing the computer to generate a searchable representation of chemical structures, the instructions comprising:
- (a) creating at least one chemical descriptor and at least one text descriptor for each compound in a collection of compounds;
  
  (b) preparing a descriptor matrix X, wherein the descriptor matrix comprises a plurality of columns, each column representing a text source containing textual and chemical descriptions, and;
  
  a plurality of rows, each row comprising a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the number of times a descriptor occurs in each respective text source;
  
  (c) performing singular value decomposition (SVD) of the descriptor matrix to produce resultant matrices;
  
  (d) using at least one of the resultant matrices to compute the similarity between the first chemical descriptor d_iand the at least one other chemical descriptor d_j; and
  
  e) outputting at least a subset of the at least one other chemical descriptor ranked in order of similarity to the first chemical descriptor.
- View Dependent Claims (30, 31, 32, 33, 34, 35)
- - 30. The computer readable medium as recited in claim 29 wherein said creating step includes generating atom pair and topological torsion descriptors from chemical connection tables of the collection of compounds.
  - 31. The computer readable medium as recited in claim 29 wherein said creating step includes creating an index of descriptors and an index of compounds in the collection.
  - 32. The computer readable medium as recited in claim 29 wherein said performing step comprises the step of:
33. The computer readable medium as recited in claim 32 wherein said computing step comprises the step of computing the dot product between the i^thand j^throws of the matrix PΣ
- .
34. The computer readable medium as recited in claim 29 wherein the first chemical descriptor is initially an ad hoc query vector q, further comprising the step of:
- determining a matrix X_k, wherein X_kis the matrix of rank k which is equivalent to P_kΣ
  
  _kQ^T_k, and is the least squares closest to X; and
  
  projecting the ad hoc query vector onto X_k.
35. The computer readable medium as recited in claim 34 wherein the ad hoc query vector q is defined as being equal to q^TPΣ
- ^−
  
  1_k.

36. A computer readable medium for calculating the similarity between a first text source and at least one other text source in a matrix comprising a plurality of chemical and textual descriptors, comprising the steps of:
- (a) creating at least one chemical descriptor and at least one text descriptor for each compound in each text source;
  
  (b) preparing a descriptor matrix X, wherein the descriptor matrix comprises;
  
  a plurality of columns, each column representing a text source containing textual and chemical descriptions, and;
  
  a plurality of rows, each row comprising a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the number of times a descriptor occurs in each respective text source;
  
  (c) performing a singular value decomposition (SVD) of the descriptor matrix to produce resultant matrices;
  
  (d) using at least one of the resultant matrices to compute the similarity between the first text source V_iand the at least one other test source V_j; and
  
  e) outputting at least a subset of the at least one other test source ranked in order of similarity to the first text source.
- View Dependent Claims (37, 38, 39, 40, 41, 42)
- - 37. The computer readable medium as recited in claim 36, wherein said creating step includes generating atom pair and topological torsion descriptors from chemical connection tables of the collection of compounds.
  - 38. The computer readable medium as recited in claim 36, wherein said creating step includes creating an index of descriptors and an index of compounds in the collection.
  - 39. The computer readable medium as recited in claim 36 wherein said performing step comprises the step of:
40. The computer readable medium as recited in claim 39 wherein said computing step comprises the step of computing the dot product between the i^thand j^throws of the matrix QΣ
- .
41. The computer readable medium as recited in claim 36 wherein the first document is initially an ad hoc query vector q, further comprising the step of:
- determining a matrix X_k, wherein X_kis the matrix of rank k which is equivalent to P_kΣ
  
  _kQ^T_k, and is the least squares closest to X; and
  
  projecting the ad hoc query vector onto X_k.
42. The computer readable medium as recited in claim 41 wherein the ad hoc query vector q is defined as being equal to q^TPΣ
- ¹_k.

43. A computer readable medium for calculating the similarity between a chemical descriptor d_jand at least one text source V_iand, in a matrix comprising a plurality of chemical and textual descriptors, comprising the steps of:
- (a) creating at least one chemical descriptor and at least one text descriptor for each compound in each text source;
  
  (b) preparing a descriptor matrix X, wherein the descriptor matrix comprises;
  
  a plurality of columns, each column representing a text source containing textual and chemical descriptions, and;
  
  a plurality of rows, each row comprising a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the number of times a descriptor occurs in a text source;
  
  (c) performing a singular value decomposition (SVD) of the descriptor matrix to produce resultant matrices;
  
  (d) using at least one of the resultant matrices to compute the similarity between at least one of the at least one text source V_iand chemical descriptor d_j; and
  
  e) outputting at least a subset of the at least one text source ranked in order of similarity to the chemical descriptor.
- View Dependent Claims (44, 45, 46, 47, 48, 49)
- - 44. The computer readable medium as recited in claim 43, wherein said creating step includes generating atom pair and topological torsion descriptors from chemical connection tables of the collection of compounds.
  - 45. The computer readable medium as recited in claim 43, wherein said creating step includes creating an index of descriptors and an index of compounds in the collection.
  - 46. The computer readable medium as recited in claim 43 wherein said performing step comprises the step of:
47. The computer readable medium as recited in claim 46 wherein said computing step comprises the step of computing the dot product between the i^throw of the matrix PΣ
- and the j^throw of the matrix QΣ
  
  .
48. The computer readable medium as recited in claim 43 wherein the chemical descriptor is initially an ad hoc query vector q, further comprising the step of:
- determining a matrix X_k, wherein X_kis the matrix of rank k which is equivalent to P_kΣ
  
  _kQ^T_k, and is the least squares closest to X; and
  
  projecting the ad hoc query vector onto X_k.
49. The computer readable medium as recited in claim 48 wherein the ad hoc query vector q is defined as being equal to q^TPΣ
- ¹_k.

50. A computer readable medium for calculating the similarity between a textual descriptor d_jand at least one text source V_iin a matrix comprising a plurality of chemical and textual descriptors, comprising the steps of:
- (a) creating at least one chemical descriptor and at least one textual descriptor for each compound in each text source;
  
  (b) preparing a descriptor matrix X, wherein the descriptor matrix comprises a plurality of columns, each column representing a test source containing textual and chemical descriptions, and;
  
  a plurality of rows, each row comprising a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the number of times a descriptor occurs in a text source;
  
  (c) performing a singular value decomposition (SVD) of the descriptor matrix to produce resultant matrices;
  
  (d) using at least one of the resultant matrices to compute the similarity between at least one of the at least one text source V_iand textual descriptor d_jand e) outputting at least a subset of the at least one text source ranked in order of similarity to the chemical descriptor.
- View Dependent Claims (51, 52, 53, 54, 55, 56)
- - 51. The computer readable medium as recited in claim 50, wherein said creating step includes generating atom pair and topological torsion descriptors from chemical connection tables of the collection of compounds.
  - 52. The computer readable medium as recited in claim 50, wherein said creating step includes creating an index of descriptors and an index of compounds in the collection.
  - 53. The computer readable medium as recited in claim 50 wherein said performing step comprises the step of:
54. The computer readable medium as recited in claim 53 wherein said computing step comprises the step of computing the dot product between the i^throw of the matrix PΣ
- and the j^throw of the matrix QΣ
  
  .
55. The computer readable medium as recited in claim 50 wherein the textual descriptor d_jis initially an ad hoc query vector q, further comprising the step of:
- determining a matrix X_k, wherein X_kis the matrix of rank k which is equivalent to P_kΣ
  
  _kQ^T_k, and is the least squares closest to X; and
  
  projecting the ad hoc query vector onto X_k.
56. The computer readable medium as recited in claim 55 wherein the ad hoc query vector q is defined as being equal to q^TPΣ
- ¹_k.

57. A method of calculating similarity or substantial similarity between a first chemical descriptor and at least one other chemical descriptor in a matrix representing a plurality of chemical and textual descriptors, comprising the steps of:
- (a) creating at least one chemical descriptor and at least one textual descriptor for each compound in a collection of compounds;
  
  (b) preparing a descriptor matrix X, wherein the descriptor matrix comprises;
  
  a text source containing textual and chemical descriptions, and;
  
  a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the relevancy of a descriptor with respect to a text source;
  
  (c) performing a singular value decomposition (SVD) of the descriptor matrix to produce resultant matrices;
  
  (d) using at least one of the resultant matrices to compute the similarity between the first chemical descriptor d_iand the at least one other chemical descriptor d_j; and
  
  e) outputting at least a subset of the at least one other chemical descriptor ranked in order of similarity to the first chemical descriptor.

58. A method of calculating similarity or substantial similarity between a first chemical descriptor and at least one other chemical descriptor in a matrix representing a plurality of chemical and textual descriptors, comprising the steps of:
- (a) creating at least one chemical descriptor and at least one textual descriptor for each compound in a collection of compounds;
  
  (b) preparing a descriptor matrix X, wherein the descriptor matrix comprises;
  
  a text source containing textual and chemical descriptions, and;
  
  a descriptor associated with each respective text source, wherein the entries in the descriptor matrix indicate the relevancy of a descriptor with respect to a text source;
  
  (c) performing a decomposition operation on the descriptor matrix to produce resultant matrices;
  
  (d) using at least one of the resultant matrices to compute the similarity between the first chemical descriptor d_iand the at least one other chemical descriptor d_j; and
  
  e) outputting at least a subset of the at least one other chemical descriptor ranked in order of similarity to the first chemical descriptor.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Axontologic, Inc.
Original Assignee
Merck & Co., Inc.
Inventors
Singh, Suresh B., Fluder, Eugene M. Jr., Hull, Richard D.
Primary Examiner(s)
Lintz, Paul R.

Application Number

US09/624,209
Time in Patent Office

512 Days
Field of Search

707/1, 707/3, 707/5, 707/102, 707/22, 707/30
US Class Current

707/741
CPC Class Codes

B01J 2219/00689   using computers

B01J 2219/00695   Synthesis control routines,...

B01J 2219/007   Simulation or vitual synthesis

G16C 20/30   Prediction of properties of...

G16C 20/40   Searching chemical structur...

G16C 20/70   Machine learning, data mini...

Y10S 707/941   Human sciences

Y10S 707/99931   Database or file accessing

Y10S 707/99933   Query processing, i.e. sear...

Y10S 707/99935   Query augmenting and refini...

Y10S 707/99943   Generating database or data...

Y10S 707/99945   Object-oriented database st...

Y10S 707/99948   Application of database or ...

Text influenced molecular indexing system and computer-implemented and/or computer-assisted method for same

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

37 Citations

58 Claims

Specification

Solutions

Use Cases

Quick Links

Text influenced molecular indexing system and computer-implemented and/or computer-assisted method for same

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

37 Citations

58 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links