Method for detecting link spam in hyperlinked databases
First Claim
1. A computer-implemented method for identifying nodes that are beneficiaries of node importance inflating links in a directed graph of linked nodes, wherein the directed graph of linked nodes corresponds to a linked database, and wherein the nodes correspond to documents within the linked database, the method comprising:
- computing, for each of at least a subset of the nodes in the directed graph, a respective quantity corresponding to a derivative of a node importance function;
for each node in the subset, comparing the respective computed quantity with a threshold; and
identifying at least a portion of the subset for which the comparison produces a predefined result.
2 Assignments
0 Petitions
Accused Products
Abstract
Methods for facilitating the identification of link spamming in a linked database include calculating a spam likelihood value for nodes in a directed graph of linked nodes are disclosed. The spam likelihood value is computed from an importance of the node and a derivative value of the importance function with respect to a coupling factor. The likelihood that the node'"'"'s importance is inflated by link spam is estimated by calculating the ratio of the magnitude of the derivative value for the node to the rank for the node. Alternatively, the spam likelihood may be computed directly from a component of the principal eigenvector of A evaluated at two values of the parameter c. The normalized derivative value can also be used to provide an order of importance in a list of nodes.
54 Citations
52 Claims
-
1. A computer-implemented method for identifying nodes that are beneficiaries of node importance inflating links in a directed graph of linked nodes, wherein the directed graph of linked nodes corresponds to a linked database, and wherein the nodes correspond to documents within the linked database, the method comprising:
-
computing, for each of at least a subset of the nodes in the directed graph, a respective quantity corresponding to a derivative of a node importance function; for each node in the subset, comparing the respective computed quantity with a threshold; and identifying at least a portion of the subset for which the comparison produces a predefined result. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47)
-
-
10. A computer-implemented method for selecting nodes in a directed graph of linked nodes, wherein the directed graph of linked nodes corresponds to a linked database, and wherein the nodes correspond to documents within the linked database, the method comprising:
-
computing, for each of at least a portion of the nodes in the directed graph, a respective quantity for the respective node corresponding to a derivative of a node importance function; and selecting a predetermined percentage of nodes in accordance with the respective quantities. - View Dependent Claims (11, 12)
-
-
13. A computer program product for use with a computer system, the computer program product comprising a computer readable storage medium and a computer program mechanism embedded therein, the computer program mechanism comprising:
-
instructions for computing, for each of at least a subset of nodes in a directed graph of linked nodes, a respective quantity corresponding to a derivative of a node importance function; instructions for comparing the respective computed quantity with a threshold for each node in the subset; and instructions for identifying at least a portion of the subset for which the comparison produces a predefined result; wherein the directed graph of linked nodes corresponds to a linked database, and wherein the nodes correspond to documents within the linked database. - View Dependent Claims (14, 15, 16, 17, 18, 19, 20)
-
-
21. A computer program product, for use with a computer system, the computer program product comprising a computer readable storage medium and a computer program mechanism embedded therein, the computer program mechanism comprising:
-
instructions for computing, for each of at least a portion of nodes in a directed graph of linked nodes, a respective quantity for the respective node corresponding to a derivative of a node importance function; and instructions for selecting a predetermined percentage of nodes in accordance with the respective quantities; wherein the directed graph of linked nodes corresponds to a linked database, and wherein the nodes correspond to documents within the linked database. - View Dependent Claims (22, 23)
-
-
24. A system for identifying nodes that are beneficiaries of node importance inflating links in a directed graph of linked nodes, the system comprising:
-
memory; one or more processors; one or more programs stored in the memory and configured for execution by the one or more processors, the one or more programs including; instructions to compute, for each of at least a subset of the nodes in the directed graph, a respective quantity corresponding to a derivative of a node importance function; instructions to compare, for each node in the subset, the respective computed quantity with a threshold; and instructions to identify at least a portion of the subset for which the comparison produces a predefined result; wherein the directed graph of linked nodes corresponds to a linked database, and wherein the nodes correspond to documents within the linked database. - View Dependent Claims (25, 26, 27, 28, 29, 30, 31)
-
-
32. A system for ordering nodes in a directed graph of linked nodes, the system comprising:
-
memory; one or more processors; one or more programs stored in the memory and configured for execution by the one or more processors, the one or more programs including; instructions to compute, for each of at least a portion of the nodes in the direct graph, a respective quantity for the respective node corresponding to a derivative of a node importance function; and instructions to select a predetermined percentage of nodes in accordance with the respective quantities; wherein the directed graph of linked nodes corresponds to a linked database, and wherein the nodes correspond to documents within the linked database. - View Dependent Claims (33, 34)
-
-
35. A system for identifying nodes that are beneficiaries of node importance inflating links in a directed graph of linked nodes, wherein the directed graph of linked nodes corresponds to a linked database, and wherein the nodes correspond to documents within the linked database, the system comprising:
-
means for computing, for each of at least a subset of the nodes in the directed graph, a respective quantity corresponding to a derivative of a node importance function; means for, for each node in the subset, comparing the respective computed quantity with a threshold; and means for identifying at least a portion of the subset for which the comparison produces a predefined result.
-
-
48. A computer-implemented method for identifying link spam in a linked database, the method comprising:
-
calculating a matrix A(c) representing the linked database, where the matrix A(c) is an N×
N matrix that is a function of a link coupling coefficient c, where N is a number of nodes in the linked database, wherein the nodes correspond to documents within the linked database;calculating a principal eigenvector of A(a), denoted x(a) and a principal eigenvector of A(b), denoted x(b) where a and b are two distinct values of the coefficient c; and for a node, calculating a spam likelihood value S from components of x(a) and x(b) corresponding to the node. - View Dependent Claims (49, 50, 51, 52)
-
Specification