METHODS AND SYSTEMS FOR PROCESSING LARGE GRAPHS USING DENSITY-BASED PROCESSES USING MAP-REDUCE

US 20130024479A1
Filed: 06/15/2012
Published: 01/24/2013
Est. Priority Date: 07/20/2011
Status: Active Grant

First Claim

Patent Images

1. A computer-implemented method for processing a graph comprising graph data in a network comprising a plurality of individual processor-based machines, the method comprising:

partitioning the graph into a plurality of partitions;

assigning each partition to a respective machine of the plurality of machines;

applying a density-based filter to each partition to produce a plurality of partitions of a filtered graph;

applying a partial connectivity detector process to each of the partitions of the filtered graph to produce sub-clusters of nodes of the filtered graph; and

merging the sub-clusters of nodes through a message based merge process.

View all claims

2 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

Embodiments are directed to a density-based clustering algorithm that decomposes and reformulates the DBSCAN algorithm to facilitate its performance on the Map-Reduce model. The DBSCAN algorithm is reformulated into connectivity problem using a density filter method and a partial connectivity detector. The density-based clustering algorithm uses message passing and edge adding to increase the speed of result merging, it also uses message mining techniques to further decrease the number of iterations to process the input graph. The algorithm is scalable, and can be accelerated by using more machines in a distributed computer network implementing the Map-Reduce program.

31 Citations

View as Search Results

20 Claims

1. A computer-implemented method for processing a graph comprising graph data in a network comprising a plurality of individual processor-based machines, the method comprising:
- partitioning the graph into a plurality of partitions;
  
  assigning each partition to a respective machine of the plurality of machines;
  
  applying a density-based filter to each partition to produce a plurality of partitions of a filtered graph;
  
  applying a partial connectivity detector process to each of the partitions of the filtered graph to produce sub-clusters of nodes of the filtered graph; and
  
  merging the sub-clusters of nodes through a message based merge process.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method of claim 1 wherein the step of assigning each partition to the respective machine of the plurality of machines comprises applying a splitting function to the graph data, and wherein a number of the plurality of partitions is a function of the number of the plurality of machines, and further wherein the splitting function uses a method selected from one of node ID and hash function.
  - 3. The method of claim 1 wherein the step of applying the density-based filter comprises:
    - applying one or more density constraints to each partition;
      
      processing the vertices of each partition to determine whether each vertex has a minimum number of neighbors within a defined distance specified by the constraints,allowing core nodes to keep edges pointing to neighbors for vertices that have the minimum number of neighbors; and
      
      removing outward edges of edge nodes with vertices that do not have the minimum number of neighbors.
  - 4. The method of claim 3 wherein the step of applying a partial connectivity detector process comprises:
    - identifying all nodes in each partition that are connected to one another as a sub-cluster;
      
      determining if each sub-cluster contains a core node; and
      
      determining a belonging value for each sub-cluster, wherein the belonging value identifies which cluster a respective sub-cluster will be merged into.
  - 5. The method of claim 4 further comprising performing an edge cutting operation that comprises:
    - identifying multiple edges between pairs of sub-clusters; and
      
      removing all edges exceeding a single edge between a pair of sub-clusters if multiple edges exist between the pair of sub-clusters.
  - 6. The method of claim 4 wherein the step of merging the sub-clusters further comprises:
    - establishing neighbor relationships among the sub-clusters based on the belonging value for each sub-cluster; and
      
      assigning an active or halt state to each sub-cluster.
  - 7. The method of claim 6 further comprising applying a message sending process that comprises:
    - broadcasting a message when a sub-cluster is in an active state, the message comprises the belonging value of a broadcasting sub-cluster, wherein a sub-cluster with at least one core node will broadcast a message to all neighbor sub-clusters, and wherein a sub-cluster no core node is an edge node that broadcasts a message indicating a need to be merged; and
      
      removing sub-clusters without core nodes from the graph.
  - 8. The method of claim 7 further comprising applying a message mining process changing a belonging value of at least one sub-cluster based upon a change of belonging value of at least one other sub-cluster.
  - 9. The method of claim 8 wherein the message mining process comprises:
    - defining a bipartite graph, wherein a first portion of the bipartite graph includes sub-cluster ID'"'"'s and corresponding belonging values, and a second portion of the bipartite graph includes destination ID values in messages of the sub-cluster.
  - 10. The method of claim 1 wherein the network executes a Map-Reduce program, and wherein the density-based filter is implemented as a Map function of a partial density-based clustering job of the Map-Reduce program, and the partial connectivity detector process is implemented as a Reduce process of the partial density-based clustering job.
  - 11. The method of claim 7 wherein the network executes a Map-Reduce program, and wherein the message sending process is implemented as Map function of an edge cut and first merge job of the Map-Reduce program, and the merging process is implemented as a Reduce process of the edge cut and first merge job.
  - 12. The method of claim 8 wherein the network executes a Map-Reduce program, and wherein the message sending process is implemented as Map function of merge job of the Map-Reduce program, and the message mining process is implemented as a Reduce process of the merge job.

13. A system for identifying clusters in a large graph using a network of coupled machines, each machine executing a Map-Reduce process, the system comprising:
- a splitting function component partitioning the graph into a plurality of partitions and assigning each partition to a respective machine of the plurality of machines;
  
  a density-based filter applied to each partition to produce a plurality of partitions of a filtered graph;
  
  a partial connectivity detector applied to each of the partitions of the filtered graph to produce sub-clusters of nodes of the filtered graph; and
  
  a merger merging the sub-clusters of nodes through a message based merge process.
- View Dependent Claims (14, 15, 16, 17, 18, 19)
- - 14. The system of claim 13 wherein density-based filter is configured to:
    - apply one or more density constraints to each partition;
      
      process the vertices of each partition to determine whether each vertex has a minimum number of neighbors within a defined distance specified by the constraints,allow core nodes to keep edges pointing to neighbors for vertices that have the minimum number of neighbors; and
      
      remove outward edges of edge nodes with vertices that do not have the minimum number of neighbors.
  - 15. The system of claim 14 wherein the partial connectivity detector is configured to:
    - identify all nodes in each partition that are connected to one another as a sub-cluster;
      
      determine if each sub-cluster contains a core node; and
      
      determine a belonging value for each sub-cluster, wherein the belonging value identifies which cluster a respective sub-cluster will be merged into.
  - 16. The system of claim 14 further comprising an edge cutting component identifying multiple edges between pairs of sub-clusters, and removing all edges exceeding a single edge between a pair of sub-clusters if multiple edges exist between the pair of sub-clusters.
  - 17. The system of claim 16 wherein the merger is configured to:
    - establish neighbor relationships among the sub-clusters based on the belonging value for each sub-cluster; and
      
      assign an active or halt state to each sub-cluster.
  - 18. The system of claim 17 further comprising a message sending component configured to:
    - broadcast a message when a sub-cluster is in an active state, the message comprises the belonging value of a broadcasting sub-cluster, wherein a sub-cluster with at least one core node will broadcast a message to all neighbor sub-clusters, and wherein a sub-cluster no core node is an edge node that broadcasts a message indicating a need to be merged; and
      
      remove sub-clusters without core nodes from the graph.
  - 19. The system of claim 18 further comprising a message miner configured to change a belonging value of at least one sub-cluster based upon a change of belonging value of at least one other sub-cluster.

20. A non-volatile, machine-readable medium containing one or more sequences of instructions for controlling access to an application program on a computer network, the instructions configured to cause a processor to:
- partition a graph to be processed by a network of coupled computers into a plurality of partitions;
  
  assign each partition to a respective computer of a plurality of computers comprising the network;
  
  apply a density-based filter to each partition to produce a plurality of partitions of a filtered graph;
  
  apply a partial connectivity detector process to each of the partitions of the filtered graph to produce sub-clusters of nodes of the filtered graph; and
  
  merge the sub-clusters of nodes through a message based merge process.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Salesforce.com, Inc.
Original Assignee
Salesforce.com, Inc.
Inventors
Koister, Jari, Gong, Nan

Granted Patent

US 8,521,782 B2
Time in Patent Office

Days
Field of Search
US Class Current

707/798
CPC Class Codes

G06F 9/5066 Algorithms for mapping a pl...

METHODS AND SYSTEMS FOR PROCESSING LARGE GRAPHS USING DENSITY-BASED PROCESSES USING MAP-REDUCE

First Claim

2 Assignments

0 Petitions

Accused Products

Abstract

31 Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

METHODS AND SYSTEMS FOR PROCESSING LARGE GRAPHS USING DENSITY-BASED PROCESSES USING MAP-REDUCE

First Claim

2 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

31 Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links