Link-based spam detection
First Claim
1. A computer implemented method of ranking search hits in a search result set, the method comprising:
- receiving a query from a user;
generating a list of hits related to the query,wherein each of the hits has a relevance to the query,wherein at least one hit is pointed to by a link in a boosting document, andwherein the link in the boosting document artificially elevates the relevance of the at least one hit to the query;
determining a first measure for said at least one hit, wherein the first measure is a link- based popularity measure for said at least one hit;
determining a second measure for said at least one hit, wherein the second measure is a trustworthiness measure for said at least one hit indicative of the likelihood that said at least one hit is a reputable document;
generating a metric for said at least one hit, based at least in part on a discrepancy between the first measure and the second measure;
wherein the metric is representative of the number of boosting documents that contain links, to said at least one hit, which artificially elevate the relevance of said at least one hit to the query;
comparing a threshold value to a value that is based, at least in part, on the metric;
processing the list of hits to form a modified list based in part on the comparing, wherein said at least one hit is either excluded from said modified list, or is presented in said modified list with a lower relevance than was attributed to said at least one hit in said list of hits; and
transmitting the modified list to the user as a response to said query.
9 Assignments
0 Petitions
Accused Products
Abstract
A computer implemented method of ranking search hits in a search result set. The computer-implemented method includes receiving a query from a user and generating a list of hits related to the query, where each of the hits has a relevance to the query, where the hits have one or more boosting linked documents pointing to the hits, and where the boosting linked documents affect the relevance of the hits to the query. The method associates a metric to each of at least a subset of the hits, the metric being representative of the number of boosting linked documents that point to each of at least a subset of the hits and which artificially inflate the relevance of the hits. The method then compares the metric, which is representative of the size of a spam farm pointing to the hit, with a threshold value, processes the list of hits to form a modified list based in part on the comparison, and transmits the modified list to the user.
74 Citations
10 Claims
-
1. A computer implemented method of ranking search hits in a search result set, the method comprising:
-
receiving a query from a user; generating a list of hits related to the query, wherein each of the hits has a relevance to the query, wherein at least one hit is pointed to by a link in a boosting document, and wherein the link in the boosting document artificially elevates the relevance of the at least one hit to the query; determining a first measure for said at least one hit, wherein the first measure is a link- based popularity measure for said at least one hit; determining a second measure for said at least one hit, wherein the second measure is a trustworthiness measure for said at least one hit indicative of the likelihood that said at least one hit is a reputable document; generating a metric for said at least one hit, based at least in part on a discrepancy between the first measure and the second measure; wherein the metric is representative of the number of boosting documents that contain links, to said at least one hit, which artificially elevate the relevance of said at least one hit to the query; comparing a threshold value to a value that is based, at least in part, on the metric; processing the list of hits to form a modified list based in part on the comparing, wherein said at least one hit is either excluded from said modified list, or is presented in said modified list with a lower relevance than was attributed to said at least one hit in said list of hits; and transmitting the modified list to the user as a response to said query. - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer implemented computer-readable storage medium storing instruction for ranking search hits in a search result set, instructions including instructions for performing the steps of:
-
receiving a query from a user; generating a list of hits related to the query, wherein each of the hits has a relevance to the query, wherein at least one hit is pointed to by a link in a boosting document, and wherein the link in the boosting document artificially elevates the relevance of the at least one hit to the query; determining a first measure for said at least one hit, wherein the first measure is a link-based popularity measure for said at least one hit; determining a second measure for said at least one hit, wherein the second measure is a trustworthiness measure for said at least one hit indicative of the likelihood that said at least one hit is a reputable document; generating a metric for said at least one hit, based at least in part on a discrepancy between the first measure and the second measure; wherein the metric is representative of the number of boosting documents that contain links, to said at least one hit, which artificially elevate the relevance of said at least one hit to the query; comparing a threshold value to a value that is based, at least in part, on the metric; processing the list of hits to form a modified list based in part on the comparing, wherein said at least one hit is either excluded from said modified list, or is presented in said modified list with a lower relevance than was attributed to said at least one hit in said list of hits; and transmitting the modified list to the user as a response to said query. - View Dependent Claims (7, 8, 9, 10)
-
Specification