Detecting spam documents in a phrase based information retrieval system
First Claim
Patent Images
1. A computer implemented method for identifying spam documents in an information retrieval system, the method comprising:
- maintaining a list of phrases, each phrase associated with a list of related phrases;
determining a number of related phrases expected to be present in a document for any phrase on the list of phrases;
determining for a document, and for at least one phrase in the document, an actual number of related phrases present in the document; and
identifying the document as a spam document by comparing the actual number of related phrases present in the document with the expected number of related phrases.
2 Assignments
0 Petitions
Accused Products
Abstract
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. A spam document is identified based on the number of related phrases included in a document.
-
Citations
11 Claims
-
1. A computer implemented method for identifying spam documents in an information retrieval system, the method comprising:
-
maintaining a list of phrases, each phrase associated with a list of related phrases;
determining a number of related phrases expected to be present in a document for any phrase on the list of phrases;
determining for a document, and for at least one phrase in the document, an actual number of related phrases present in the document; and
identifying the document as a spam document by comparing the actual number of related phrases present in the document with the expected number of related phrases. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
Specification