Methods and apparatus for distributed community finding
First Claim
1. A method comprising:
- parsing patent data to generate a set of nodes;
selecting at least one node of the set of nodes;
determining initial links from meta data associated with the patent data for the at least one node;
creating links among the set of nodes based on the metadata;
identifying a set of seed nodes;
determining a community structure for the set of seed nodes, the community structure including a plurality of communities; and
assigning concepts to the plurality of communities,wherein determining the community structure comprises;
initiating a percolation message from a source node of a linked network, the linked network comprising a plurality of nodes and a plurality of edges, each edge connecting at least two of the plurality of nodes, wherein a node is a neighbor if the node is connected to another node in the plurality of nodes by an edge, wherein the percolation message comprises a percolation probability and an identifier of the source node, and wherein initiating a percolation message from the source node comprises transmitting the percolation message to each neighbor of the source node with the percolation probability;
propagating the percolation message through the linked network, wherein propagating the percolation message through the linked network comprises;
transmitting the percolation message from each node that receives the percolation message to each neighbor of each node that receives the percolation message; and
transmitting a response to the source node from each node that receives the percolation message;
collecting each response to the percolation message at the source node; and
storing a list of nodes that transmitted the response at the source node.
6 Assignments
0 Petitions
Accused Products
Abstract
Methods and apparatus for a new approach to the problem of finding communities in complex networks relating to a social definition of communities and percolation are disclosed. Instead of partitioning the graph into separate subgraphs from top to bottom a local algorithm (communities of each vertex) allows overlapping of communities. The performance of an algorithm on synthetic, randomly-generated graphs and real-world networks is used to benchmark this method against others. An heuristic is provided to generate a list of communities for networks using a local community finding algorithm. Unlike diffusion based algorithms, The provided algorithm finds overlapping communities and provides a means to measure confidence in community structure. It features locality and low complexity for exploring the communities for a subset of network nodes, without the need for exploring the whole graph.
156 Citations
60 Claims
-
1. A method comprising:
-
parsing patent data to generate a set of nodes; selecting at least one node of the set of nodes; determining initial links from meta data associated with the patent data for the at least one node; creating links among the set of nodes based on the metadata; identifying a set of seed nodes; determining a community structure for the set of seed nodes, the community structure including a plurality of communities; and assigning concepts to the plurality of communities, wherein determining the community structure comprises; initiating a percolation message from a source node of a linked network, the linked network comprising a plurality of nodes and a plurality of edges, each edge connecting at least two of the plurality of nodes, wherein a node is a neighbor if the node is connected to another node in the plurality of nodes by an edge, wherein the percolation message comprises a percolation probability and an identifier of the source node, and wherein initiating a percolation message from the source node comprises transmitting the percolation message to each neighbor of the source node with the percolation probability; propagating the percolation message through the linked network, wherein propagating the percolation message through the linked network comprises; transmitting the percolation message from each node that receives the percolation message to each neighbor of each node that receives the percolation message; and transmitting a response to the source node from each node that receives the percolation message; collecting each response to the percolation message at the source node; and storing a list of nodes that transmitted the response at the source node. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15)
-
-
11. A method comprising:
-
parsing patent data to generate a set of nodes; selecting at least one node of the set of nodes; determining initial links from metadata associated with the patent data for the at least one node; creating links among the set of nodes based on the metadata; identifying a set of seed nodes; determining a community structure for the set of seed nodes, the community structure including a plurality of communities; and assigning concepts to the plurality of communities, wherein determining the community structure comprises; initiating a plurality of percolation messages from each of a plurality of source nodes at a plurality of percolation probabilities, wherein the plurality of percolation probabilities are selected from a set of values between 0 and 1; determining a list of nodes that received the plurality of percolation messages for each of the plurality of source nodes; aggregating the list of nodes for each of the plurality of source nodes to determine a local neighborhood for each of the plurality of source nodes, wherein each node is weighted based on a number of times each node received the plurality of percolation messages; plotting the local neighborhood size versus percolation probabilities for each of the local neighborhoods in a plot; locating phase transition points in the plot; and determining nested local communities for the plurality of source nodes by combining the local neighborhoods at the phase transition points.
-
-
16. A non-transitory program storage device readable by a machine, embodying a program of instructions executable by the machine to perform a method, the method comprising:
-
parsing patent data to generate a set of nodes; selecting at least one node of the set of nodes; determining initial links from meta data associated with the patent data for the at least one node; creating links among the set of nodes based on the metadata; identifying a set of seed nodes; determining a community structure for the set of seed nodes, the community structure including a plurality of communities; and assigning concepts to the plurality of communities, wherein determining the community structure comprises; initiating a percolation message from a source node of a linked network, the linked network comprising a plurality of nodes and a plurality of edges, each edge connecting at least two of the plurality of nodes, wherein a node is a neighbor if the node is connected to another node in the plurality of nodes by an edge, wherein the percolation message comprises a percolation probability and an identifier of the source node, and wherein initiating a percolation message from the source node comprises transmitting the percolation message to each neighbor of the source node with the percolation probability; propagating the percolation message through the linked network, wherein propagating the percolation message through the linked network comprises; transmitting the percolation message from each node that receives the percolation message to each neighbor of each node that receives the percolation message; and transmitting a response to the source node from each node that receives the percolation message; collecting each response to the percolation message at the source node; and storing a list of nodes that transmitted the response at the source node. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 29, 30)
-
-
26. A non-transitory program device readable by a machine, embodying a program of instructions executable by the machine to perform a method, the method comprising:
-
parsing patent data to generate a set of nodes; selecting at least one node of the set of nodes; determining initial links from meta data associated with the patent data for the at least one node; creating links among the set of nodes based on the metadata; identifying a set of seed nodes; determining a community structure for the set of seed nodes, the community structure including a plurality of communities; and assigning concepts to the plurality of communities, wherein determining the community structure comprises; initiating a plurality of percolation messages from each of a plurality of source nodes at a plurality of percolation probabilities, wherein the plurality of percolation probabilities are selected from a set of values between 0 and 1; determining a list of nodes that received the plurality of percolation messages for each of the plurality of source nodes; aggregating the list of nodes for each of the plurality of source nodes to determine a local neighborhood for each of the plurality of source nodes, wherein each node is weighted based on a number of times each node received the plurality of percolation messages; plotting the local neighborhood size versus percolation probabilities for each of the local neighborhoods in a plot; locating phase transition points in the plot; and determining nested local communities for the plurality of source nodes by combining the local neighborhoods at the phase transition points.
-
-
31. An apparatus comprising:
-
memory; one or more computers configured to; parse patent data to generate a set of nodes; select at least one node of the set of nodes; determine initial links from meta data associated with the patent data for the at least one node; create links among the set of nodes based on the metadata; identify a set of seed nodes; determine a community structure for the set of seed nodes, the community structure including a plurality of communities; and assign concepts to the plurality of communities, wherein determining the community structure comprises; initiating a percolation message from a source node of a linked network, the linked network comprising a plurality of nodes and a plurality of edges, each edge connecting at least two of the plurality of nodes, wherein a node is a neighbor if the node is connected to another node in the plurality of nodes by an edge, wherein the percolation message comprises a percolation probability and an identifier of the source node, and wherein initiating a percolation message from the source node comprises transmitting the percolation message to each neighbor of the source node with the percolation probability; propagating the percolation message through the linked network, wherein propagating the percolation message through the linked network comprises; transmitting the percolation message from each node that receives the percolation message to each neighbor of each node that receives the percolation message; and transmitting a response to the source node from each node that receives the percolation message; collecting each response to the percolation message at the source node; and storing a list of nodes that transmitted the response at the source node. - View Dependent Claims (32, 33, 34, 35, 36, 37, 38, 39, 40, 42, 43, 44, 45)
-
-
41. An apparatus comprising:
-
memory; one or more computers configured to; parse patent data to generate a set of nodes; select at least one node of the set of nodes; determine initial links from meta data associated with the patent data for the at least one node; create links among the set of nodes based on the metadata; identify a set of seed nodes; determine a community structure for the set of seed nodes, the community structure including a plurality of communities; and assign concepts to the plurality of communities, wherein determining the community structure comprises; initiating a plurality of percolation messages from each of a plurality of source nodes at a plurality of percolation probabilities, wherein the plurality of percolation probabilities are selected from a set of values between 0 and 1; determining a list of nodes that received the plurality of percolation messages for each of the plurality of source nodes; aggregating the list of nodes for each of the plurality of source nodes to determine a local neighborhood for each of the plurality of source nodes, wherein each node is weighted based on a number of times each node received the plurality of percolation messages; plotting the local neighborhood size versus percolation probabilities for each of the local neighborhoods in a plot; locating phase transition points in the plot; and determining nested local communities for the plurality of source nodes by combining the local neighborhoods at the phase transition points.
-
-
46. An apparatus comprising:
-
means for parsing patent data to generate a set of nodes; means for selecting at least one node of the set of nodes; means for determining initial links from meta data associated with the patent data for the at least one node; means for creating links among the set of nodes based on the metadata; means for identifying a set of seed nodes; means for determining a community structure for the set of seed nodes, the community structure including a plurality of communities; and means for assigning concepts to the plurality of communities, wherein the means for determining the community structure comprises; means for initiating a percolation message from a source node of a linked network, the linked network comprising a plurality of nodes and a plurality of edges, each edge connecting at least two of the plurality of nodes, wherein a node is a neighbor if the node is connected to another node in the plurality of nodes by an edge, wherein the percolation message comprises a percolation probability and an identifier of the source node, and wherein initiating a percolation message from the source node comprises transmitting the percolation message to each neighbor of the source node with the percolation probability; means for propagating the percolation message through the linked network, wherein propagating the percolation message through the linked network comprises; means for transmitting the percolation message from each node that receives the percolation message to each neighbor of each node that receives the percolation message; and means for transmitting a response to the source node from each node that receives the percolation message; means for collecting each response to the percolation message at the source node; and means for storing a list of nodes that transmitted the response at the source node. - View Dependent Claims (47, 48, 49, 50, 51, 52, 53, 54, 55, 57, 58, 59, 60)
-
-
56. An apparatus comprising:
-
means for parsing patent data to generate a set of nodes; means for selecting at least one node of the set of nodes; means for determining initial links from meta data associated with the patent data for the at least one node; means for creating links among the set of nodes based on the metadata; means for identifying a set of seed nodes; means for determining a community structure for the set of seed nodes, the community structure including a plurality of communities; and means for assigning concepts to the plurality of communities, wherein determining the community structure comprises; means for initiating a plurality of percolation messages from each of a plurality of source nodes at a plurality of percolation probabilities, wherein the plurality of percolation probabilities are selected from a set of values between 0 and 1; means for determining a list of nodes that received the plurality of percolation messages for each of the plurality of source nodes; means for aggregating the list of nodes for each of the plurality of source nodes to determine a local neighborhood for each of the plurality of source nodes, wherein each node is weighted based on a number of times each node received the plurality of percolation messages; means for plotting the local neighborhood size versus percolation probabilities for each of the local neighborhoods in a plot; means for locating phase transition points in the plot; and means for determining nested local communities for the plurality of source nodes by combining the local neighborhoods at the phase transition points.
-
Specification