Method for identifying network similarity by matching neighborhood topology

US 8,000,262 B2
Filed: 04/18/2008
Issued: 08/16/2011
Est. Priority Date: 04/18/2008
Status: Active Grant

First Claim

Patent Images

1. A non-transitory computer-readable storage medium storing a computer readable program of computer instructions, wherein the computer readable program when executed on a computer causes the computer to carry out operations to identify a common subgraph between first and second networks, the first network G₁having a set of nodes V₁and a set of edges E₁, the second network G₂having a set of nodes V₂and a set of edges E₂, where N(a) is a set of neighbors of a given node a, the set being of size |N(a)|, with each edge e of a network having an edge weight w(e), the operations comprising:

for each of a set of node pairs (i, j), where i is a node from the first network and j is a node from the second network, and where u is a neighbor of i and v is a neighbor of j, computing a similarity score R_i,jequal to a support value provided to the node pair (i, j) by |N(i)∥

N(j)| possible matches between neighbors of i and j, where each neighboring node pair (u,v) distributes back its score R_uvamong |N(u)∥

N(v)| possible matches between neighbors of u and v;

using the similarity scores to construct the common subgraph.

View all claims

0 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method of computing a measure of similarity between nodes of first and second networks is described. In particular, sets of pairwise scores are computed to find nodes in the individual networks that are good matches to one another. Thus, a pairwise score, referred to as R_ij, is computed for a node i in the first network and a node j in the second network. Similar pairwise scores are computed for each of the nodes in each network. The goal of this process is to identify node pairs that exhibit high R_ijvalues. According to the technique described herein, the intuition is that nodes i and j are a good match if their neighbors are a good match. This technique produces a measure of “network similarity.” If node feature data also is available, the intuition may be expanded such that nodes i and j are considered a good match if their neighbors are a good match (network similarity) and their node features are a good match (node similarity). Node feature data typically is domain-specific. Using the similarity scores, a common subgraph between the first and second networks then can be computed.

Citations

21 Claims

1. A non-transitory computer-readable storage medium storing a computer readable program of computer instructions, wherein the computer readable program when executed on a computer causes the computer to carry out operations to identify a common subgraph between first and second networks, the first network G₁having a set of nodes V₁and a set of edges E₁, the second network G₂having a set of nodes V₂and a set of edges E₂, where N(a) is a set of neighbors of a given node a, the set being of size |N(a)|, with each edge e of a network having an edge weight w(e), the operations comprising:
- for each of a set of node pairs (i, j), where i is a node from the first network and j is a node from the second network, and where u is a neighbor of i and v is a neighbor of j, computing a similarity score R_i,jequal to a support value provided to the node pair (i, j) by |N(i)∥
  
  N(j)| possible matches between neighbors of i and j, where each neighboring node pair (u,v) distributes back its score R_uvamong |N(u)∥
  
  N(v)| possible matches between neighbors of u and v;
  
  using the similarity scores to construct the common subgraph.
- View Dependent Claims (2, 3, 4, 5)
- - 2. The computer-readable storage medium as described in claim 1 where the support value is a total support value provided to the node pair by each of the |N(i)∥
    - (j)| possible matches between neighbors of i and j.
  - 3. The computer-readable storage medium as described in claim 1 where the support value is a maximum support value provided to the node pair by any of the |N(i)∥
    - N(j)| possible matches between neighbors of i and j.
  - 4. The computer-readable storage medium as described in claim 1 wherein the edge weight w(e) of each edge e is equal to 1.
  - 5. The computer-readable storage medium as described in claim 1 wherein the common subgraph is constructed by the following sub-steps:
    - selecting from the similarity scores a set of scores that capture mutually-consistent pairwise matches; and
      
      extracting node mappings from the set of scores to identify one or more conserved edges between the first and second network.

6. An article comprising a non-transitory tangible machine-readable medium that stores a program, the program being executable by a machine to perform a method of identifying a graph that is substantially isomorphic to subgraphs of first and second networks, the first network G₁having a set of nodes V₁and a set of edges E₁, the second network G₂of nodes V₂and a set of edges E₂, where N(a) is a set of neighbors of a given node a, the set being of size |N(a)|, with each edge e of a network having an edge weight w(e), the method comprising:
- requiring a set of constraints R_i,jto hold for all possible pairs of (i, j), where i is a node from the first network and j is a node from the second network, and where u is a neighbor of i and v is a neighbor of j, where;
  
  R_ij=Σ
  
  _uε
  
  N(i)Σ
  
  _vε
  
  N(j)[w(i,u)w(j,v)/(Σ
  
  _rε
  
  N(i)w(r,u)Σ
  
  _qε
  
  N(j)w(q,v))]R_uvwhere iε
  
  V₁and jε
  
  V₂;
  
  computing a vector R by solving the constraints; and
  
  extracting node mappings from vector R to identify the graph.
- View Dependent Claims (7, 8, 9, 10)
- - 7. The article as described in claim 6 wherein vector R is computed by identifying a principal eigenvector of a matrix A[i,j][u,v]=w(i,u)w(j,v)/(Σ
    - _rε
      
      N(i)w(r,u)Σ
      
      _qε
      
      N(j)w(q,v)), if (i,u)ε
      
      E₁and (j,v)ε
      
      E₂, and 0 otherwise, where A is a |V₁∥
      
      V₂|×
      
      |V₁∥
      
      V₂| matrix and A[i,j][u,v] is an entry at row (i,j) and column (u,v) of the matrix.
  - 8. The article as described in claim 6 wherein the edge weight w(e) of each edge in each network is 1, such that:
    - R_ij=Σ
      
      _uε
      
      N(i)Σ
      
      _vε
      
      N(j)[1/(|N(u)∥
      
      N(v)|)]R_uvwhere iε
      
      V₁and jε
      
      V₂.
  - 9. The article as described in claim 6 wherein the edge weight w(e) of at least one edge in the first network or the second network is 0<
    - w(e)≦
      
      1.
  - 10. The article as described in claim 6 wherein the set of constraints also includes a set of node similarity scores, where constraint R_ijdepends on scores R_uvand, in addition, on node similarity between i and j.

11. A computer program product comprising a non-transitory tangible machine-readable medium that stores a program, the program being executable by a machine to perform a method of identifying a common subgraph between first and second networks, the first network G₁having a set of nodes V₁and a set of edges E₁, the second network G₂having a set of nodes V₂and a set of edges E₂, where N(a) is a set of neighbors of a given node a, the set being of size |N(a)|, with each edge e of a network having an edge weight w(e), the method comprising:
- for each of a set of node pairs (i, j), where i is a node from the first network and j is a node from the second network, and where u is a neighbor of i and v is a neighbor of j, computing a similarity score R_i,jaccording to the following equation;
  
  R_ij=Σ
  
  _uε
  
  N(i)Σ
  
  vε
  
  N(j)[1/(|N(u)∥
  
  N(v)|)]R_uvwhere iε
  
  V₁and jε
  
  V₂; and
  
  using the similarity scores to identify the common subgraph.
- View Dependent Claims (12, 13, 14, 15, 16, 17)
- - 12. The computer program product as described in claim 11 wherein the edge weight w(e) of each edge e is equal to 1.
  - 13. The computer program product as described in claim 11 wherein the common subgraph is constructed by the following sub-steps of the method:
    - selecting from the similarity scores a set of scores that capture mutually-consistent pairwise matches; and
      
      extracting node mappings from the set of scores to identify one or more conserved edges between the first and second network.
  - 14. The computer program product as described in claim 11 where the first and second networks are each a protein interaction network having a node corresponding to a protein and an edge corresponding to an interaction between proteins.
  - 15. The computer program product as described in claim 11 where the first and second networks each represent a set of web pages, with a node corresponding to a web page and an edge corresponding to a link between web pages.
  - 16. The computer program product as described in claim 11 where the first and second networks each represent a web page, with a node corresponding to a portion of a web page and an edge corresponding to a link between web page portions.
  - 17. The computer program product as described in claim 11 where the first and second networks each represent a digital image, with a node corresponding to a feature within an image and an edge corresponding to a spatial relationship of the feature.

18. A non-transitory computer-readable storage medium storing a computer readable program of computer instructions, wherein the computer readable program when executed on a computer causes the computer to carry out operations to match first and second networks, the first network G₁having a set of nodes V₁and a set of edges E₁, the second network G₂of nodes V₂, where N(a) is a set of neighbors of a given node a, the set being of size |N(a)|, with each edge e having an edge weight w(e), the operations comprising:
- establishing a set of constraints as a convex combination of a set of network similarity scores and a set of node similarity scores, wherein the set of constraints conform to the following equation;
  
  R_ij=α
  
  (Σ
  
  _uε
  
  N(i)Σ
  
  _vε
  
  N(j)[w(i,u)w(j,v)/Σ
  
  _rε
  
  N(i)w(r,u)Σ
  
  _qε
  
  N(j)w(q,v)]R_uv)+(1−
  
  α
  
  )B_ijwhere B is a set of node similarity scores between the nodes of the first and second networks, with score scaled by a uniform multiple such that Σ
  
  B_ij=1, where iε
  
  V₁and jε
  
  V₂and where 0<
  
  α
  
  ≦
  
  1;
  
  computing a vector R by identifying a principal eigenvector of a matrix A[i,j][u,v], where A[i,j][u,v]=(α
  
  Σ
  
  _uε
  
  N(i)Σ
  
  _vε
  
  N(j)[w(i,u)w(j,v)/Σ
  
  _rε
  
  N(i)w(r,u)Σ
  
  _qε
  
  N(j)w(q,v)]+(1−
  
  α
  
  )B_ij), if (i,u)ε
  
  E₁and (j,v)ε
  
  E₂, and 0 otherwise, where A is a |V₁∥
  
  V₂|×
  
  |V₁∥
  
  V₂| matrix and A[i,j][u,v] is an entry at row (i,j) and column (u,v); and
  
  extracting node mappings from vector R to match the first and second networks.

19. A computer program product comprising a non-transitory tangible machine-readable medium that stores a program, the program being executable by a machine to perform a method of identifying a measure of similarity between nodes of first and second networks, the method comprising:
- identifying a node pair (i, j), where i is a node from the first network and j is a node from the second network, and where u is a neighbor of i and v is a neighbor of j; and
  
  computing a network similarity score R_i,jfor the node pair to be equal to a total support provided by all supporting node pairs adjacent to the node pair, each supporting node pair providing its support in proportion to a number of supporting node pairs it has to support;
  
  wherein R_ij=Σ
  
  _uε
  
  N(i)Σ
  
  _vε
  
  N(i)[1/(|N(u)∥
  
  N(v)|)] R_uvwhere iε
  
  V₁and jε
  
  V₂.

20. A computer program product comprising a non-transitory tangible machine-readable medium that stores a program, the program being executable by a machine to perform a method of identifying a measure of similarity among nodes of multiple networks, the method comprising:
- for each pair of networks, generating pairwise alignment data by;
  
  identifying a node pair (i, j), where i is a node from a first network and j is a node from a second network, and where u is a neighbor of i and v is a neighbor of j; and
  
  computing a network similarity score R_i,jfor the node pair to be equal to a total support provided by all supporting node pairs adjacent to the node pair, each supporting node pair providing its support in proportion to a number of supporting node pairs it has to support;
  
  wherein R_ij=Σ
  
  _uε
  
  N(i)Σ
  
  _vε
  
  N(i)[1/(|N(u)∥
  
  N(v)|)] R_uvwhere iε
  
  V₁and jε
  
  V₂; and
  
  applying an algorithm to the pairwise alignment data to find alignment data for the multiple networks.
- View Dependent Claims (21)
- - 21. The computer program product as described in claim 20 wherein the algorithm is a greedy algorithm.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Bonnie Berger Leighton, Rohit Singh
Original Assignee
Bonnie Berger Leighton, Rohit Singh
Inventors
Singh, Rohit, Leighton, Bonnie Berger
Primary Examiner(s)
Ton; Dang T
Assistant Examiner(s)
Preval; Lionel

Application Number

US12/105,815
Publication Number

US 20090262664A1
Time in Patent Office

1,215 Days
Field of Search

379/254, 703/2, 707/3, 715/853
US Class Current

370/254
CPC Class Codes

G06F 16/24578   using ranking

G06F 16/9024   Graphs; Linked lists G06F16...

H04L 41/12   Discovery or management of ...

Y10S 707/99932   Access augmentation or opti...

Method for identifying network similarity by matching neighborhood topology

First Claim

0 Assignments

0 Petitions

Accused Products

Abstract

Citations

21 Claims

Specification

Solutions

Use Cases

Quick Links

Method for identifying network similarity by matching neighborhood topology

First Claim

0 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

21 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links