×

Spam email detection based on n-grams with feature selection

  • US 7,912,907 B1
  • Filed: 10/07/2005
  • Issued: 03/22/2011
  • Est. Priority Date: 10/07/2005
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method for identifying spam email messages, the method comprising the steps of:

  • tokenizing an email message into a collection of overlapping n-grams;

    comparing the collection of n-grams to n-grams of known artifacts found in email messages due to how the email messages were produced and transmitted, wherein the known artifacts comprise machine-generated text artifacts included in the email messages by email service providers;

    removing n-grams that match an n-gram of a known artifact from the collection;

    comparing the remaining n-grams in the collection to n-grams of known spam email messages; and

    determining whether the email message comprises spam based on results of the second comparing step.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×