Dynamically ranking nodes and labels in a hyperlinked database
First Claim
1. A method for computing probabilistic measures for a labeled directed graph, said method comprising:
- accessing, by a processor, data representative of a labelled directed graph of nodes and labels on said nodes connected by directed edges;
computing, by said processor for said directed graph, flow values across all of said nodes for said labels;
iteratively determining, by said processor from the directed graph, a matrix of values representative of the influence between pairs of nodes in said directed graph, wherein each entry in said matrix is determined based on said flow values, comprises a reachability measure from a first node to a second node in a given pair of nodes through all of multiple possible paths between said first node and said second node and encodes a total flow value observed by said second node when a unit of flow is sent from said first node to said second node such that said reachability measure indicates the effect that said first node has on said second node,wherein at least some of said paths have different lengths,wherein a number of iterations used, during said iteratively determining, is limited to a user-selected maximum path length to be considered, andwherein said matrix is stored as a sparse matrix in which Bij represents the total influence that node j exerts on node i, as measured by equationB=I+W+W2+W3+ . . . Wtmax, where I, W, W2, W3, and Wtmax represent influences through paths of length zero, one, two, three, and said user-selected maximum path length, respectively;
determining, by said processor from said matrix and from a distribution of a particular label in said directed graph, multiple probabilistic measures associated with said nodes and said labels of said directed graph, wherein said probabilistic measures comprise at least a measure of the importance of a node, a measure of the importance of a label, a measure of the importance of a node to a label and a measure of the importance of a label to a node; and
generating rankings by said processor based on said probabilistic measures, said generating comprising at least ranking of said nodes for any given label, ranking of said labels for any given node, ranking of said labels and ranking of said nodes.
1 Assignment
0 Petitions
Accused Products
Abstract
The World Wide Web (WWW) can be modelled as a labelled directed graph G(V,E,L), in which V is the set of nodes, E is the set of edges, and L is a label function that maps edges to labels. This model, when applied to the WWW, indicates that V is a set of hypertext documents or objects, E is a set of hyperlinks connecting the documents in V, and the edge-label function represents the anchor-text corresponding to the hyperlinks. One can find a probabilistic ranking of the nodes for any given label, a ranking of the labels for any given node, and rankings of labels and pages using flow based models. Further, the flows can be computing using sparse matrix operations.
-
Citations
13 Claims
-
1. A method for computing probabilistic measures for a labeled directed graph, said method comprising:
-
accessing, by a processor, data representative of a labelled directed graph of nodes and labels on said nodes connected by directed edges; computing, by said processor for said directed graph, flow values across all of said nodes for said labels; iteratively determining, by said processor from the directed graph, a matrix of values representative of the influence between pairs of nodes in said directed graph, wherein each entry in said matrix is determined based on said flow values, comprises a reachability measure from a first node to a second node in a given pair of nodes through all of multiple possible paths between said first node and said second node and encodes a total flow value observed by said second node when a unit of flow is sent from said first node to said second node such that said reachability measure indicates the effect that said first node has on said second node, wherein at least some of said paths have different lengths, wherein a number of iterations used, during said iteratively determining, is limited to a user-selected maximum path length to be considered, and wherein said matrix is stored as a sparse matrix in which Bij represents the total influence that node j exerts on node i, as measured by equation B=I+W+W2+W3+ . . . Wtmax, where I, W, W2, W3, and Wtmax represent influences through paths of length zero, one, two, three, and said user-selected maximum path length, respectively; determining, by said processor from said matrix and from a distribution of a particular label in said directed graph, multiple probabilistic measures associated with said nodes and said labels of said directed graph, wherein said probabilistic measures comprise at least a measure of the importance of a node, a measure of the importance of a label, a measure of the importance of a node to a label and a measure of the importance of a label to a node; and generating rankings by said processor based on said probabilistic measures, said generating comprising at least ranking of said nodes for any given label, ranking of said labels for any given node, ranking of said labels and ranking of said nodes. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 13)
-
-
9. A computer program product for computing probabilistic measures for a labelled directed graph, said computer program product comprising a computer software recorded on a storage medium and accessible by a computer system for executing a method comprising:
-
accessing data representative of a labelled directed graph of nodes and labels on said nodes connected by directed edges; computing, for said directed graph, flow values across all of said nodes for said labels; iteratively determining, from the directed graph, a matrix of values representative of the influence between pairs of nodes in said directed graph, wherein each entry in said matrix is determined based on said flow values, comprises a reachability measure from a first node to a second node in a given pair of nodes through all of multiple possible paths between said first node and said second node and encodes a total flow value observed by said second node when a unit of flow is sent from said first node to said second node such that said reachability measure indicates the effect that said first node has on said second node, wherein at least some of said paths have different lengths, wherein a number of iterations used, during said iteratively determining, is limited to a user-selected maximum path length to be considered, and wherein said matrix is stored as a sparse matrix in which Bij represents the total influence that node j exerts on node i, as measured by equation B=I+W+W2+W3+ . . . Wtmax, where I, W, W2, W3, and Wtmax represent influences through paths of length zero, one, two, three, and said user-selected maximum path length, respectively; determining, from said matrix and from a distribution of a particular label in said directed graph, multiple probabilistic measures associated with said nodes and said labels of said directed graph, wherein said probabilistic measures comprise at least a measure of the importance of a node, a measure of the importance of a label, a measure of the importance of a node to a label and a measure of the importance of a label to a node; and generating rankings based on said probabilistic measures, said generating comprising at least ranking of said nodes for any given label, ranking of said labels for any given node, ranking of said labels and ranking of said nodes. - View Dependent Claims (11, 12)
-
-
10. A computer system for computing probabilistic measures for a labelled directed graph, said computer system comprising a computer software program recorded on a storage medium accessible by said computer system for executing a method comprising:
-
accessing data representative of a labelled directed graph of nodes and labels on said nodes connected by directed edges; computing, for said directed graph, flows values across all of said nodes for said labels; iteratively determining, from the directed graph, a matrix of values representative of the influence between pairs of nodes in said directed graph, wherein each entry in said matrix is determined based on said flow values, comprises a reachability measure from a first node to a second node in a given pair of nodes through all of multiple possible paths between said first node and said second node and encodes a total flow value observed by said second node when a unit of flow is sent from said first node to said second node such that said reachability measure indicates the effect that said first node has on said second node, wherein at least some of said paths have different lengths, wherein a number of iterations used, during said iteratively determining, is limited to a user-selected maximum path length to be considered, and wherein said matrix is stored as a sparse matrix in which Bij represents the total influence that node j exerts on node i, as measured by equation B=I+W+W2+W3+ . . . Wtmax, where I, W, W2, W3, and Wtmax represent influences through paths of length zero, one, two, three, and said user-selected maximum path length, respectively; determining, from said matrix and from a distribution of a particular label in said directed graph, multiple probabilistic measures associated with said nodes and said labels of said directed graph, wherein said probabilistic measures comprise at least a measure of the importance of a node, a measure of the importance of a label, a measure of the importance of a node to a label and a measure of the importance of a label to a node; and generating rankings based on said probabilistic measures, said generating comprising at least ranking of said nodes for any given label, ranking of said labels for any given node, ranking of said labels and ranking of said nodes.
-
Specification