METHODS AND SYSTEMS FOR PROCESSING LARGE GRAPHS USING DENSITY-BASED PROCESSES USING MAP-REDUCE
First Claim
1. A computer-implemented method for processing a graph comprising graph data in a network comprising a plurality of individual processor-based machines, the method comprising:
- partitioning the graph into a plurality of partitions;
assigning each partition to a respective machine of the plurality of machines;
applying a density-based filter to each partition to produce a plurality of partitions of a filtered graph;
applying a partial connectivity detector process to each of the partitions of the filtered graph to produce sub-clusters of nodes of the filtered graph; and
merging the sub-clusters of nodes through a message based merge process.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments are directed to a density-based clustering algorithm that decomposes and reformulates the DBSCAN algorithm to facilitate its performance on the Map-Reduce model. The DBSCAN algorithm is reformulated into connectivity problem using a density filter method and a partial connectivity detector. The density-based clustering algorithm uses message passing and edge adding to increase the speed of result merging, it also uses message mining techniques to further decrease the number of iterations to process the input graph. The algorithm is scalable, and can be accelerated by using more machines in a distributed computer network implementing the Map-Reduce program.
31 Citations
20 Claims
-
1. A computer-implemented method for processing a graph comprising graph data in a network comprising a plurality of individual processor-based machines, the method comprising:
-
partitioning the graph into a plurality of partitions; assigning each partition to a respective machine of the plurality of machines; applying a density-based filter to each partition to produce a plurality of partitions of a filtered graph; applying a partial connectivity detector process to each of the partitions of the filtered graph to produce sub-clusters of nodes of the filtered graph; and merging the sub-clusters of nodes through a message based merge process. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A system for identifying clusters in a large graph using a network of coupled machines, each machine executing a Map-Reduce process, the system comprising:
-
a splitting function component partitioning the graph into a plurality of partitions and assigning each partition to a respective machine of the plurality of machines; a density-based filter applied to each partition to produce a plurality of partitions of a filtered graph; a partial connectivity detector applied to each of the partitions of the filtered graph to produce sub-clusters of nodes of the filtered graph; and a merger merging the sub-clusters of nodes through a message based merge process. - View Dependent Claims (14, 15, 16, 17, 18, 19)
-
-
20. A non-volatile, machine-readable medium containing one or more sequences of instructions for controlling access to an application program on a computer network, the instructions configured to cause a processor to:
-
partition a graph to be processed by a network of coupled computers into a plurality of partitions; assign each partition to a respective computer of a plurality of computers comprising the network; apply a density-based filter to each partition to produce a plurality of partitions of a filtered graph; apply a partial connectivity detector process to each of the partitions of the filtered graph to produce sub-clusters of nodes of the filtered graph; and merge the sub-clusters of nodes through a message based merge process.
-
Specification