WEB GRAPH COMPRESSION THROUGH SCALABLE PATTERN MINING
First Claim
1. A machine-implemented method for compressing a graph including a plurality of nodes and a plurality of links between ones of the plurality of nodes, the machine-implemented method comprising:
- clustering similar nodes of the plurality nodes;
finding a pattern among the clustered similar nodes;
creating a virtual node based on the found pattern; and
substituting links to the virtual node for links corresponding to the found pattern.
2 Assignments
0 Petitions
Accused Products
Abstract
A method and a processing device are provided for compressing a web graph including multiple nodes and links between the multiple nodes. Nodes of the web graph may be clustered into groups including no more than a predetermined number of nodes. A list of links of the clustered nodes may be created and sorted based on a frequency of occurrence of each of the links. A prefix tree may be created based on the sorted list of links. The prefix tree may be walked to find candidate virtual nodes. The candidate virtual nodes may be analyzed according to a selection criteria and a virtual node may be selected. The prefix tree may be adjusted to account for the selection of the virtual node and the virtual node may be added to the web graph.
35 Citations
20 Claims
-
1. A machine-implemented method for compressing a graph including a plurality of nodes and a plurality of links between ones of the plurality of nodes, the machine-implemented method comprising:
-
clustering similar nodes of the plurality nodes; finding a pattern among the clustered similar nodes; creating a virtual node based on the found pattern; and substituting links to the virtual node for links corresponding to the found pattern. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A processing device comprising:
-
at least one processor; and a memory connected to the at least one processor, the memory further comprising; instructions for determining similar nodes of a graph and clustering the similar nodes, each of the nodes representing a respective network document and each link of a respective node being a link to a second respective node, instructions for finding a plurality of patterns among the clustered similar nodes, instructions for selecting one of the found plurality of patterns and creating a virtual node, and instructions for replacing links corresponding to the selected virtual node with links to the virtual node. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A system comprising:
-
a plurality of processing devices networked together, each of the processing devices comprising; at least one processor; and a memory connected to the at least one processor, the memory of a first one of the processing devices comprising; instructions for determining groups of similar nodes of a graph and clustering the groups of similar nodes, instructions for providing respective groups of the groups of similar nodes to respective second ones of the plurality of processing devices, instructions for receiving respective virtual nodes from each of the second ones of the plurality of processing devices, and instructions for replacing links corresponding to the virtual nodes with links to the virtual nodes. - View Dependent Claims (15, 16, 17, 18, 19, 20)
-
Specification