×

Detecting spam documents in a phrase based information retrieval system

  • US 7,603,345 B2
  • Filed: 06/28/2006
  • Issued: 10/13/2009
  • Est. Priority Date: 07/26/2004
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer implemented method for identifying spam documents in an information retrieval system, the method comprising:

  • maintaining a list of phrases in a memory, each phrase associated with a list of related phrases;

    determining, for a document that contains a first phrase from the list of phrases, a number of the related phrases related to the first phrase expected to be present in the document;

    determining for the document, and for the first phrase in the document, an actual number of related phrases present in the document; and

    identifying the document as a spam document by comparing the actual number of related phrases present in the document with the expected number of related phrases,wherein determining the number of related phrases related to the first phrase expected to be present in the document includes;

    traversing an index of a plurality of documents;

    for each of the indexed documents;

    determining a set of phrases in the indexed document from the list of phrases, andfor each phrase in the set, determining a number of related phrases also in the indexed document; and

    determining the expected number of related phrases based on the determined number of related phrases, related to the first phrase, in the indexed documents.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×