Graph querying, graph motif mining and the discovery of clusters
First Claim
Patent Images
1. A computer-implemented method for determining a significance of frequent subgraphs in a database graph comprising:
- (a) selecting one or more vertices or edges of a database graph as features using, as criteria for the selection, a frequency that a vertex or edge occurs in the database graph, a size of a vertex or edge in the database graph, a structural overlap between vertices or edges in the database graph, or a co-occurrence of vertices or edges in the database graph;
(b) transforming the selected features into feature vectors, wherein each feature vector comprises a frequency of the selected features in the database graph;
(c) evaluating the feature vectors by modeling a probability that the selected features occur in a random one of the feature vectors; and
(d) determining a statistical significance of the feature vectors based on the evaluating step (c), by computing a probability of occurrence of the feature vectors in a random one of the features vector based on the modeled probability, and then obtaining a probability distribution on support of the features vector in a database of random vectors using the probability of occurrence.
2 Assignments
0 Petitions
Accused Products
Abstract
A method for analyzing, querying, and mining graph databases using subgraph and similarity querying. An index structure, known as a closure tree, is defined for topological summarization of a set of graphs. In addition, a significance model is created in which the graphs are transformed into histograms of primitive components. Finally, connected substructures or clusters, comprising paths or trees, are detected in networks found in the graph databases using a random walk technique and a repeated random walk technique.
41 Citations
10 Claims
-
1. A computer-implemented method for determining a significance of frequent subgraphs in a database graph comprising:
-
(a) selecting one or more vertices or edges of a database graph as features using, as criteria for the selection, a frequency that a vertex or edge occurs in the database graph, a size of a vertex or edge in the database graph, a structural overlap between vertices or edges in the database graph, or a co-occurrence of vertices or edges in the database graph; (b) transforming the selected features into feature vectors, wherein each feature vector comprises a frequency of the selected features in the database graph; (c) evaluating the feature vectors by modeling a probability that the selected features occur in a random one of the feature vectors; and (d) determining a statistical significance of the feature vectors based on the evaluating step (c), by computing a probability of occurrence of the feature vectors in a random one of the features vector based on the modeled probability, and then obtaining a probability distribution on support of the features vector in a database of random vectors using the probability of occurrence. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented apparatus for determining a significance of frequent subgraphs in a database graph comprising:
-
at least one or more computer systems configured to perform the steps of; (a) selecting one or more vertices or edges of a database graph as features using, as criteria for the selection, a frequency that a vertex or edge occurs in the database graph, a size of a vertex or edge in the database graph, a structural overlap between vertices or edges in the database graph, or a co-occurrence of vertices or edges in the database graph; (b) transforming the selected features into feature vectors, wherein each feature vector comprises a frequency of the selected feature in the database graph; (c) evaluating the feature vectors by modeling a probability that the selected features occur in a random one of the feature vectors; and (d) determining a statistical significance of the feature vectors based on the evaluating step (c), by computing a probability of occurrence of the feature vectors in a random one of the features vector based on the modeled probability, and then obtaining a probability distribution on support of the features vector in a database of random vectors using the probability of occurrence. - View Dependent Claims (7, 8, 9, 10)
-
Specification