Locally computable spam detection features and robust pagerank
First Claim
Patent Images
1. A computer-implemented system that facilitates reducing spam in search results, comprising:
- one or more processors; and
memory, communicatively coupled to the one or more processors, for storing;
an interface that obtains web graph information associated with a web graph;
a spam detection component that determines one or more features based at least in part on the web graph information, the one or more features indicating pages of the web graph that are spam; and
a robust rank component that ranks at least one page of the web graph, the robust rank component including a contribution limiting component that restricts a contribution of a page in a supporting set of the at least one page, the contribution limiting component decreasing the contribution of the page in the supporting set to a value no greater than a predetermined threshold.
2 Assignments
0 Petitions
Accused Products
Abstract
The claimed subject matter provides a system and/or a method that facilitates reducing spam in search results. An interface can obtain web graph information that represents a web of pages. A spam detection component can determines one or more features based at least in part on the web graph information. The one or more features can provide indications that a particular page of the web graph is spam. In addition, a robust rank component is provided that limits amount of contribution a single page can provide to the target page.
24 Citations
19 Claims
-
1. A computer-implemented system that facilitates reducing spam in search results, comprising:
-
one or more processors; and memory, communicatively coupled to the one or more processors, for storing; an interface that obtains web graph information associated with a web graph; a spam detection component that determines one or more features based at least in part on the web graph information, the one or more features indicating pages of the web graph that are spam; and a robust rank component that ranks at least one page of the web graph, the robust rank component including a contribution limiting component that restricts a contribution of a page in a supporting set of the at least one page, the contribution limiting component decreasing the contribution of the page in the supporting set to a value no greater than a predetermined threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A method that facilitates reducing a rank of spam pages in a web graph, comprising:
-
evaluating, by a computing device, a contribution vector of a target page in the web graph, the contribution vector including individual contributions of other pages in the web graph to the target page; ascertaining, by the computing device, a supporting set of the target page, the supporting set including pages in the contribution vector that provide a contribution above a pre-determined threshold; determining, by the computing device, if contributions from pages in the supporting set exceed a predetermined maximum value; and restricting, by the computing device, the determined contributions to no more than the predetermined maximum value. - View Dependent Claims (17, 18)
-
-
19. A system that facilitates reducing spam created via engineered link structures, comprising:
-
one or more processors; and computer-executable instructions that, when executed by the one or more processors, cause the one or more processors to perform operations comprising; obtaining an approximation of a contribution vector for a target page; evaluating an approximate supporting set based upon the approximation of the contribution vector; determining one or more unsupervised learning features according to the approximate supporting set; ascertaining one or more supervised learning features based at least in part on the approximate supporting set and a set of preexisting labels; and labeling the target page as one of spam or non spam based at least in part on the unsupervised learning features or the supervised learning features.
-
Specification