Method and system for automated inference creation of physico-chemical interaction knowledge from databases of co-occurrence data
First Claim
1. A method for measuring a strength of co-occurrence data, comprising:
- extracting two or more chemical or biological molecules names from a database record from an inference database, wherein the inference database includes a plurality of inference database records created from an indexed literature database, and wherein the two or more chemical or biological molecule names co-occur in one or more records in an indexed scientific literature database;
determining a Likelihood statistic for a co-occurrence reflecting physico-chemical interactions between a first chemical or biological molecule name-A and a second chemical or biological molecule name-B extracted from the database record;
applying the Likelihood statistic to the co-occurrence to determine if the co-occurrence between the first chemical or biological molecule-A and the second chemical or biological molecule-B is a non-trivial co-occurrence reflecting physico-chemical interactions.
3 Assignments
0 Petitions
Accused Products
Abstract
Methods and system for automated inference of physico-chemical interaction knowledge from databases of term co-occurrence data. The co-occurrence data includes co-occurrences between chemical or biological molecules or co-occurrences between chemical or biological molecules and biological processes. Likelihood statistics are determined and applied to decide if co-occurrence data reflecting physico-chemical interactions is non-trivial. A next node or an unknown target representing chemical or biological molecules in a biological pathway is selected based on co-occurrence values. The method and system may be used to further facilitate a user'"'"'s understanding of biological functions, such as cell functions, to design experiments more intelligently and to analyze experimental results more thoroughly. Specifically, the present invention may help drug discovery scientists select better targets for pharmaceutical intervention in the hope of curing diseases. The method and system may also help facilitate the abstraction of knowledge from information for biological experimental data and provide new bioinformatic techniques.
-
Citations
27 Claims
-
1. A method for measuring a strength of co-occurrence data, comprising:
-
extracting two or more chemical or biological molecules names from a database record from an inference database, wherein the inference database includes a plurality of inference database records created from an indexed literature database, and wherein the two or more chemical or biological molecule names co-occur in one or more records in an indexed scientific literature database;
determining a Likelihood statistic for a co-occurrence reflecting physico-chemical interactions between a first chemical or biological molecule name-A and a second chemical or biological molecule name-B extracted from the database record;
applying the Likelihood statistic to the co-occurrence to determine if the co-occurrence between the first chemical or biological molecule-A and the second chemical or biological molecule-B is a non-trivial co-occurrence reflecting physico-chemical interactions. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method for contextual querying of co-occurrence data, comprising:
-
selecting a target node from a first list of nodes connected by a plurality of arcs in a connection network, wherein the connection network includes a plurality of nodes representing a plurality of chemical or biological molecules names and a plurality of arcs connecting the plurality of nodes in a pre-determined order, and wherein the plurality of arcs represent co-occurrence values of physico-chemical interactions between chemical or biological molecules;
creating a second list of nodes by considering simultaneously a plurality of other nodes that are neighbors of the target node as well as neighbors of the plurality of other nodes in prior to the target node in the connection network;
selecting a next node from the second list of nodes using the co-occurrence values, wherein the next node is a most likely next node after the target node in the pre-determined order for the connection network based on the co-occurrence values. - View Dependent Claims (10, 11, 12, 13, 14, 15)
-
-
16. A method for query polling of co-occurrence data, comprising:
-
selecting a position in a connection network for an unknown target node from a first list of nodes, wherein the connection network includes a plurality of nodes representing a plurality of chemical or biological molecules names and a plurality of arcs connecting the plurality of nodes in a pre-determined order, and wherein the plurality of arcs represent co-occurrence values of physico-chemical interactions between chemical or biological molecules;
determining a second list of nodes prior to the position of unknown target node in the connection network;
determining a third list of nodes subsequent to the position of unknown target node in the connection network;
determining a fourth list of nodes included in both the second list of nodes and the third list of nodes; and
determining an identity for the unknown target node by selecting a node with a from the fourth list of nodes using a Likelihood statistic, wherein the Likelihood statistic includes a co-occurrence value reflecting physico-chemical interactions between a first chemical or biological molecule-A and a second chemical or biological molecule-B. - View Dependent Claims (17, 18, 19, 20)
-
-
21. A method for creating automated biological inferences, comprising:
-
constructing a connection network using one or more database records from an inference database, wherein the connection network includes a plurality of nodes for chemical or biological molecules and biological processes found to co-occur one or more times, wherein the plurality of nodes are connected by a plurality of arcs in a pre-determined order, and wherein the inference database was created from chemical or biological molecule and biological process information extracted from a structured literature database;
applying Likelihood statistic analysis methods to the connection network to determine possible inferences between the chemical or biological molecules and biological processes;
generating automatically one or more biological inferences regarding relationships between chemical or biological molecules and biological processes using results from the Likelihood statistic analysis methods. - View Dependent Claims (22, 23, 24, 25, 26, 27)
-
Specification