Search engine spam detection using external data
First Claim
1. A method of evaluating an electronic document in connection with a search, said method comprising:
- parsing an electronic document to identify a first and a second attribute of the electronic document, said electronic document being retrievable by a search engine in response to a search request from a user and a determination by the search engine that the electronic document is relevant to the requested search, said first attribute corresponding to an electronic mail message attribute, said second attribute characterizing a pattern for manipulating a relevance determination of the electronic document with respect to the search request;
receiving information from a source external to the search engine, said received information including the electronic mail message attribute relating to an undesirable electronic mail message;
determining a first confidence level of the electronic document based on the first attribute of said electronic document, said first confidence level indicating a likelihood that the electronic document is associated with the undesirable electronic mail message;
determining a second confidence level of the electronic document based on the second attribute of said electronic document, said second confidence level indicating a likelihood that the electronic document is unsatisfactory with respect to the search request;
generating a rating for the electronic document as a function of the determined first confidence level and the determined second confidence level; and
designating the electronic document as unsatisfactory in connection with the search request based on the generated rating of the electronic document.
2 Assignments
0 Petitions
Accused Products
Abstract
Evaluating an electronic document in connection with a search. An external source provides data for use in evaluating an electronic document retrieved by a search engine. A first confidence level of the electronic document is determined based on the externally provided data. The first confidence level indicates a likelihood that the electronic document is undesirable. A second confidence level of the electronic document is determined based on attributes of the electronic document. The second confidence level indicates a likelihood that the electronic document is unsatisfactory with respect to a search. A rating for the electronic document generated as a function of the determined first confidence level and the determined second confidence level is used to categorize the electronic document as unsatisfactory in connection with a received search request.
75 Citations
25 Claims
-
1. A method of evaluating an electronic document in connection with a search, said method comprising:
-
parsing an electronic document to identify a first and a second attribute of the electronic document, said electronic document being retrievable by a search engine in response to a search request from a user and a determination by the search engine that the electronic document is relevant to the requested search, said first attribute corresponding to an electronic mail message attribute, said second attribute characterizing a pattern for manipulating a relevance determination of the electronic document with respect to the search request; receiving information from a source external to the search engine, said received information including the electronic mail message attribute relating to an undesirable electronic mail message; determining a first confidence level of the electronic document based on the first attribute of said electronic document, said first confidence level indicating a likelihood that the electronic document is associated with the undesirable electronic mail message; determining a second confidence level of the electronic document based on the second attribute of said electronic document, said second confidence level indicating a likelihood that the electronic document is unsatisfactory with respect to the search request; generating a rating for the electronic document as a function of the determined first confidence level and the determined second confidence level; and designating the electronic document as unsatisfactory in connection with the search request based on the generated rating of the electronic document. - View Dependent Claims (2, 3, 4, 5, 6, 7, 13, 14, 15, 16, 17)
-
-
8. A system for evaluating an electronic document in connection with a search, said system comprising:
-
a processor for receiving a search request from a user and for identifying an electronic document based on a determination that the electronic document is relevant to the received search request; a memory area storing data provided by a source external to the processor, said data including an electronic mail message attribute relating to an undesirable electronic mail message; said processor being configured to parse the electronic document to identify a first and a second attribute of the electronic document, said first attribute corresponding to the electronic mail message attribute, said second attribute characterizing a pattern for manipulating a relevance determination of the electronic document with respect to the search request; said processor being further configured to determine a first confidence level of the electronic document based on the first attribute of said electronic document, said first confidence level indicating a likelihood that the electronic document is associated with an undesirable electronic mail message; said processor being further configured to establish a second confidence level of the electronic document based on the second attribute of the electronic document, said second confidence level indicating a likelihood that the electronic document is unsatisfactory with respect to a search based on one or more attributes of the electronic document; said processor being further configured to generate a rating for the electronic document as a function of the determined first confidence level and the established second confidence level and to categorize the electronic document as unsatisfactory in connection with the received search request based on the generated rating of the electronic document. - View Dependent Claims (9, 10, 18, 19, 20, 21)
-
-
11. One or more computer volatile or nonvolatile media having computer-executable components for evaluating an electronic document in connection with a search, said computer-readable media comprising:
-
a query component to receive a search request from a user and to identify an electronic document based on a determination that the electronic document is relevant to the received search request; an external component to provide data, said data including an electronic mail message attribute relating to an for use in evaluating whether the electronic document is undesirable electronic mail message; an internal component configured to; parse the electronic document to identify a first and a second attribute of the electronic document, said first attribute corresponding to the electronic mail message attribute, said second attribute characterizing a pattern for manipulating a relevance determination of the electronic document with respect to the search request; determine a first confidence level of the electronic document based on the first attribute of said electronic document, said first confidence level indicating a likelihood that the electronic document is associated with an undesirable electronic mail message; and establish a second confidence level of the electronic document based on the second attribute of the electronic document, said second confidence level indicating a likelihood that the electronic document is unsatisfactory with respect to a search based on one or more attributes of the electronic document; an analyzing component to generate a rating for the electronic document as a function of the determined first confidence level and the established second confidence level; and wherein the query component is configured to classify the electronic document as unsatisfactory in connection with the received search request based on the generated rating of the electronic document. - View Dependent Claims (12, 22, 23, 24, 25)
-
Specification