LOCALLY COMPUTABLE SPAM DETECTION FEATURES AND ROBUST PAGERANK
First Claim
Patent Images
1. A computer-implemented system that facilitates reducing spam in search results, comprising:
- an interface that obtains web graph information; and
a spam detection component that determines one or more features based at least in part on the web graph information, the one or more features indicate pages of the web graph that are spam.
2 Assignments
0 Petitions
Accused Products
Abstract
The claimed subject matter provides a system and/or a method that facilitates reducing spam in search results. An interface can obtain web graph information that represents a web of pages. A spam detection component can determines one or more features based at least in part on the web graph information. The one or more features can provide indications that a particular page of the web graph is spam. In addition, a robust rank component is provided that limits amount of contribution a single page can provide to the target page.
23 Citations
20 Claims
-
1. A computer-implemented system that facilitates reducing spam in search results, comprising:
-
an interface that obtains web graph information; and a spam detection component that determines one or more features based at least in part on the web graph information, the one or more features indicate pages of the web graph that are spam. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
-
-
17. A computer-implemented method that facilitates reducing a rank of spam pages in a web graph, comprising:
-
evaluating a contribution vector of a target page in the web graph, the contribution vector includes individual contributions of other pages in the web graph to the target page; ascertaining a supporting set of the target page, the supporting set includes pages in the contribution vector that provide a contribution above a pre-determined threshold; determining if contributions from pages in the supporting set exceed a predetermined maximum value; and restricting the determined contributions to no more than the predetermined maximum value. - View Dependent Claims (18, 19)
-
-
20. A system that facilitates reducing spam created via engineered link structures, comprising:
-
means for obtaining an approximation of a contribution vector for a target page; means for evaluating an approximate supporting set based upon the approximation of the contribution vector; means for determining one or more unsupervised learning features according to the approximate supporting set; means for ascertaining one or more supervised learning features based at least in part on the approximate supporting set and a set of preexisting labels; and means for labeling the target page as one of spam or non spam based at least in part on the unsupervised learning features or the supervised learning features.
-
Specification