Methods and systems for using map-reduce for large-scale analysis of graph-based data
First Claim
1. A computer-implemented method for processing graph data, the method comprising:
- executing a Markov Clustering algorithm (MCL) to find clusters of vertices of the graph data;
organizing the graph data by column by calculating a probability percentage for each column of a similarity matrix of the graph data to produce column data;
generating a probability matrix of states of the column data;
performing an expansion of the probability matrix by computing a power of the probability matrix using a Map-Reduce model executed in a processor-based computing device; and
organizing the probability matrix into a set of sub-matrices to find the least amount of data needed for the Map-Reduce model given that two lines of data in the probability matrix are required to compute a single value for the power of the probability matrix.
2 Assignments
0 Petitions
Accused Products
Abstract
Embodiments are described for a method for processing graph data by executing a Markov Clustering algorithm (MCL) to find clusters of vertices of the graph data, organizing the graph data by column by calculating a probability percentage for each column of a similarity matrix of the graph data to produce column data, generating a probability matrix of states of the column data, performing an expansion of the probability matrix by computing a power of the matrix using a Map-Reduce model executed in a processor-based computing device; and organizing the probability matrix into a set of sub-matrices to find the least amount of data needed for the Map-Reduce model given that two lines of data in the matrix are required to compute a single value for the power of the matrix. One of at least two strategies may be used to computing the power of the matrix (matrix square, M2) based on simplicity of execution or improved memory usage.
-
Citations
22 Claims
-
1. A computer-implemented method for processing graph data, the method comprising:
-
executing a Markov Clustering algorithm (MCL) to find clusters of vertices of the graph data; organizing the graph data by column by calculating a probability percentage for each column of a similarity matrix of the graph data to produce column data; generating a probability matrix of states of the column data; performing an expansion of the probability matrix by computing a power of the probability matrix using a Map-Reduce model executed in a processor-based computing device; and organizing the probability matrix into a set of sub-matrices to find the least amount of data needed for the Map-Reduce model given that two lines of data in the probability matrix are required to compute a single value for the power of the probability matrix. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A system for processing graph data in a distributed computing network coupling one or more server computers to a plurality of workstation computers, comprising:
-
a first component executing a Markov Clustering algorithm (MCL) to find clusters of vertices of the graph data and organizing the graph data by column by calculating a probability percentage for each column of a similarity matrix of the graph data to produce column data; and a Map-Reduce component implemented on a Hadoop distributed file system to generate a probability matrix of states of the column data and performing an expansion of the probability matrix by computing a power of the probability matrix using, and to organize the probability matrix into a set of sub-matrices to find the least amount of data needed for the Map-Reduce model given that two lines of data in the probability matrix are required to compute a single value for the power of the matrix. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18)
-
-
19. A non-volatile, non-transitory machine-readable medium containing one or more sequences of instructions for processing large-scale graph data in a distributed computing environment through a computer network coupling client computers to a server computer, the instructions configured to cause a processor to:
-
execute a Markov Clustering algorithm (MCL) to find clusters of vertices of the graph data; organize the graph data by column by calculating a probability percentage for each column of a similarity matrix of the graph data to produce column data; generate a probability matrix of states of the column data; perform an expansion of the probability matrix by computing a power of the probability matrix using a Map-Reduce model executed in a processor-based computing device; and organize the probability matrix into a set of sub-matrices to find the least amount of data needed for the Map-Reduce model given that two lines of data in the probability matrix are required to compute a single value for the power of the probability matrix. - View Dependent Claims (20, 21, 22)
-
Specification