×

Apparatus and method for auxiliary classification for generating features for a spam filtering model

  • US 8,112,484 B1
  • Filed: 05/31/2006
  • Issued: 02/07/2012
  • Est. Priority Date: 05/31/2006
  • Status: Active Grant
First Claim
Patent Images

1. A system for filtering spam email messages, the system comprising:

  • a memory for storing program code and a processor for processing program code to generate a plurality of modules further comprising;

    a base spam filter feature extractor comprising first machine learning logic performing machine learning operations to detect base spam features from a training corpus of ham and spam email messages the base spam filter feature extractor performing the operations of;

    analyzing an email message to detect whether the email message contains features within a base spam feature set;

    firing one or more of the base spam features based, at least in part, on the analysis;

    assigning a weight to each of the base spam features according to how well the base spam features correctly differentiate between ham and spam email messages;

    an auxiliary obfuscation model feature extractor comprising second machine learning logic performing machine learning operations to detect text obfuscation within the training corpus of ham and spam email messages, the auxiliary obfuscation model feature extractor comprising an obfuscation feature set for detecting obfuscation within email messages, the auxiliary obfuscation model feature extractor performing the operations of;

    analyzing an email message to detect whether the email message contains features within the obfuscation feature set;

    firing one or more of the obfuscation detection features based, at least in part, on the analysis;

    assigning a weight to each of the obfuscation detection features according to how well the obfuscation detection features correctly detect text obfuscation in email messages;

    an auxiliary obfuscation detection module to receive an indication of the different sets of features and associated weights detected by the auxiliary obfuscation model feature extractor and to apply the associated weights to the detected features in a stream of incoming email messages; and

    a base spam filter module to receive an indication of the base spam features and associated weights from the base spam filter feature extractor and the weights applied by the auxiliary obfuscation detection module to the stream of incoming email messages, the base spam filter module to apply base spam filter weights to the base spam features detected in the stream of incoming email messages and to determine whether an email message is spam based on a combined weights of the base spam features and the weights of the obfuscation features applied by the auxiliary obfuscation detection module;

    summing the weights of the base spam features and the weights of features applied by the auxiliary obfuscation model feature extractor to generate a spam score; and

    identifying the email message as spam if the spam score is above a specified threshold value.

View all claims
  • 4 Assignments
Timeline View
Assignment View
    ×
    ×