Graph querying, graph motif mining and the discovery of clusters
First Claim
Patent Images
1. A computer-implemented method for conducting a database graph query, comprising:
- (a) obtaining a first database graph and a second database graph, wherein the first database graph and second database graph each have two or more vertices and one or more edges;
(b) mapping the first database graph to the second database graph, wherein;
(i) each vertex in the first database graph has a corresponding vertex in the second database graph; and
(ii) each edge in the first database graph has a corresponding edge in the second database graph;
(c) creating a graph closure tree comprised of a union of the first database graph and the second database graph based on the mapping, wherein each node of the graph closure tree comprises a graph closure of the node'"'"'s children and each child of a leaf node comprises a database graph; and
(d) conducting a graph query based on the graph closure tree.
3 Assignments
0 Petitions
Accused Products
Abstract
A method for analyzing, querying, and mining graph databases using subgraph and similarity querying. An index structure, known as a closure tree, is defined for topological summarization of a set of graphs. In addition, a significance model is created in which the graphs are transformed into histograms of primitive components. Finally, connected substructures or clusters, comprising paths or trees, are detected in networks found in the graph databases using a random walk technique and a repeated random walk technique.
195 Citations
14 Claims
-
1. A computer-implemented method for conducting a database graph query, comprising:
-
(a) obtaining a first database graph and a second database graph, wherein the first database graph and second database graph each have two or more vertices and one or more edges;
(b) mapping the first database graph to the second database graph, wherein;
(i) each vertex in the first database graph has a corresponding vertex in the second database graph; and
(ii) each edge in the first database graph has a corresponding edge in the second database graph;
(c) creating a graph closure tree comprised of a union of the first database graph and the second database graph based on the mapping, wherein each node of the graph closure tree comprises a graph closure of the node'"'"'s children and each child of a leaf node comprises a database graph; and
(d) conducting a graph query based on the graph closure tree. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-implemented method for determining a significance of frequent subgraphs in a database graph comprising:
-
selecting one or more vertices or edges of a database graph as features;
transforming the selected features into feature vectors, wherein each feature vector comprises a frequency of the selected feature in the database graph;
evaluating the feature vectors; and
determining a statistical significance of the feature vectors based on the evaluating. - View Dependent Claims (9, 10, 11, 12)
-
-
13. A computer-implemented method for finding a significant group of proteins in a genome scale protein interaction network comprising:
-
(a) obtaining graph G=(V,E) representing a genome scale protein interaction network, wherein V comprises a set of nodes/proteins in the graph and E comprises a set of weighted undirected edges between pairs of nodes/proteins, wherein the edges are weighted by a probability of interaction;
(b) beginning on an initial node, moving to a neighboring node based on the weight of connecting edges;
(c) moving to a new neighboring node based on the weight of connecting edges at every time tick for a defined period of time;
(d) teleporting to the initial node and repeating steps (b) and (c) based on a restart probability α
; and
(e) determining a significant group of proteins based on a proximity of a node to the initial node, wherein the proximity is based on a percentage of time spent on the node during steps (b) and (c);
(f) repeating steps (b)-(d) wherein every node in the network is used as the initial node;
(g) inserting a cluster of proteins into a priority queue based on a statistical significance of each cluster;
- View Dependent Claims (14)
-
Specification