×

Web graph compression through scalable pattern mining

  • US 7,818,303 B2
  • Filed: 01/29/2008
  • Issued: 10/19/2010
  • Est. Priority Date: 01/29/2008
  • Status: Expired due to Fees
First Claim
Patent Images

1. A machine-implemented method for compressing a graph including a plurality of nodes and a plurality of links between ones of the plurality of nodes, the machine-implemented method comprising:

  • clustering similar nodes of the plurality nodes;

    scanning a node list, corresponding to the clustered similar nodes of the plurality of nodes to determine frequencies of occurrence of links of the clustered similar nodes;

    sorting a list of the links of the clustered similar nodes based on the determined frequencies of occurrence of the links of the clustered similar nodes to produce a sorted link list;

    adding each of the links, having a frequency of occurrence greater than a predetermined number, from the sorted link list as an item of a prefix tree, along with an associated node list;

    walking the prefix tree to produce a list of candidate virtual nodes;

    determining a respective compression performance for each pattern corresponding to a respective candidate virtual node according to a compression performance formula Compression(P)=(P.frequency−

    1)(P.size−

    1)−

    1, where Compression(P) is compression performance for a pattern P, P.frequency is a number of the clustered similar nodes having links corresponding to the pattern P, and P.size is a length of the pattern P in the prefix tree;

    sorting the list of candidate virtual nodes based on results of applying the compression performance formula;

    selecting one of the list of candidate virtual nodes as a virtual node based on one of the patterns having a highest determined compression performance;

    creating the virtual node; and

    substituting links to the virtual node for links corresponding to the one of the patterns, whereinthe machine-implemented method is implemented either on one processing device or on a networked plurality of processing devices.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×