Apparatus and method for auxiliary classification for generating features for a spam filtering model
First Claim
1. A system for filtering spam email messages, the system comprising:
- a memory for storing program code and a processor for processing program code to generate a plurality of modules further comprising;
a base spam filter feature extractor comprising first machine learning logic performing machine learning operations to detect base spam features from a training corpus of ham and spam email messages the base spam filter feature extractor performing the operations of;
analyzing an email message to detect whether the email message contains features within a base spam feature set;
firing one or more of the base spam features based, at least in part, on the analysis;
assigning a weight to each of the base spam features according to how well the base spam features correctly differentiate between ham and spam email messages;
an auxiliary obfuscation model feature extractor comprising second machine learning logic performing machine learning operations to detect text obfuscation within the training corpus of ham and spam email messages, the auxiliary obfuscation model feature extractor comprising an obfuscation feature set for detecting obfuscation within email messages, the auxiliary obfuscation model feature extractor performing the operations of;
analyzing an email message to detect whether the email message contains features within the obfuscation feature set;
firing one or more of the obfuscation detection features based, at least in part, on the analysis;
assigning a weight to each of the obfuscation detection features according to how well the obfuscation detection features correctly detect text obfuscation in email messages;
an auxiliary obfuscation detection module to receive an indication of the different sets of features and associated weights detected by the auxiliary obfuscation model feature extractor and to apply the associated weights to the detected features in a stream of incoming email messages; and
a base spam filter module to receive an indication of the base spam features and associated weights from the base spam filter feature extractor and the weights applied by the auxiliary obfuscation detection module to the stream of incoming email messages, the base spam filter module to apply base spam filter weights to the base spam features detected in the stream of incoming email messages and to determine whether an email message is spam based on a combined weights of the base spam features and the weights of the obfuscation features applied by the auxiliary obfuscation detection module;
summing the weights of the base spam features and the weights of features applied by the auxiliary obfuscation model feature extractor to generate a spam score; and
identifying the email message as spam if the spam score is above a specified threshold value.
4 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented system and method are described for integrating a series of auxiliary spam detection models with a base spam detection model, thereby improving the accuracy and efficiency of the overall spam detection engine. For example, a system according to one embodiment of the invention comprises: a base spam filter feature extractor to detect a first set of features from incoming email messages; one or more auxiliary model feature extractors, each of the auxiliary model feature extractors to detect a different set of features from the incoming email messages; one or more auxiliary detection modules, each of the auxiliary detection modules to receive an indication of the different sets of features detected by a corresponding one of the auxiliary model feature extractor modules and to apply weights to the detected features; and a base spam filter module to receive an indication of the first set of features from the base spam filter feature extractor and the weights generated by the auxiliary detection modules, the base spam filter module to assign base spam filter weights to the first set of features and to determine whether an email message is spam based on the weights of the first set of features and the weights of the different set of features identified by the auxiliary model feature extractors.
-
Citations
9 Claims
-
1. A system for filtering spam email messages, the system comprising:
- a memory for storing program code and a processor for processing program code to generate a plurality of modules further comprising;
a base spam filter feature extractor comprising first machine learning logic performing machine learning operations to detect base spam features from a training corpus of ham and spam email messages the base spam filter feature extractor performing the operations of; analyzing an email message to detect whether the email message contains features within a base spam feature set; firing one or more of the base spam features based, at least in part, on the analysis; assigning a weight to each of the base spam features according to how well the base spam features correctly differentiate between ham and spam email messages; an auxiliary obfuscation model feature extractor comprising second machine learning logic performing machine learning operations to detect text obfuscation within the training corpus of ham and spam email messages, the auxiliary obfuscation model feature extractor comprising an obfuscation feature set for detecting obfuscation within email messages, the auxiliary obfuscation model feature extractor performing the operations of; analyzing an email message to detect whether the email message contains features within the obfuscation feature set; firing one or more of the obfuscation detection features based, at least in part, on the analysis; assigning a weight to each of the obfuscation detection features according to how well the obfuscation detection features correctly detect text obfuscation in email messages; an auxiliary obfuscation detection module to receive an indication of the different sets of features and associated weights detected by the auxiliary obfuscation model feature extractor and to apply the associated weights to the detected features in a stream of incoming email messages; and a base spam filter module to receive an indication of the base spam features and associated weights from the base spam filter feature extractor and the weights applied by the auxiliary obfuscation detection module to the stream of incoming email messages, the base spam filter module to apply base spam filter weights to the base spam features detected in the stream of incoming email messages and to determine whether an email message is spam based on a combined weights of the base spam features and the weights of the obfuscation features applied by the auxiliary obfuscation detection module; summing the weights of the base spam features and the weights of features applied by the auxiliary obfuscation model feature extractor to generate a spam score; and identifying the email message as spam if the spam score is above a specified threshold value. - View Dependent Claims (2, 3)
- a memory for storing program code and a processor for processing program code to generate a plurality of modules further comprising;
-
4. A computer-implemented method for filtering spam email messages comprising:
-
providing a base spam filter feature extractor executed by a processor, comprising first machine learning logic performing machine learning operations to detect base spam features from a training corpus of ham and spam email messages the base spam filter feature extractor performing the operations of; analyzing an email message to detect whether the email message contains features within a base spam feature set; firing one or more of the base spam features based, at least in part, on the analysis; assigning a weight to each of the base spam features according to how well the base spam features correctly differentiate between ham and spam email messages; providing an auxiliary obfuscation model feature extractor comprising second machine learning logic performing machine learning operations to detect text obfuscation within the training corpus of ham and spam email messages, the auxiliary obfuscation model feature extractor comprising an obfuscation feature set for detecting obfuscation within email messages, the auxiliary obfuscation model feature extractor performing the operations of; analyzing an email message to detect whether the email message contains features within the obfuscation feature set; firing one or more of the obfuscation detection features based, at least in part, on the analysis; assigning a weight to each of the obfuscation detection features according to how well the obfuscation detection features correctly detect text obfuscation in email messages; providing an auxiliary obfuscation detection module to receive an indication of the different sets of features and associated weights detected by the auxiliary obfuscation model feature extractor and to apply the associated weights to the detected features in a stream of incoming email messages; and providing a base spam filter module to receive an indication of the base spam features and associated weights from the base spam filter feature extractor and the weights applied by the auxiliary obfuscation detection module to the stream of incoming email messages, the base spam filter module to apply base spam filter weights to the base spam features detected in the stream of incoming email messages and to determine whether an email message is spam based on a combined weights of the base spam features and the weights of the obfuscation features applied by the auxiliary obfuscation detection module; summing the weights of the base spam features and the weights of features applied by the auxiliary obfuscation model feature extractor to generate a spam score; and identifying the email message as spam if the spam score is above a specified threshold value. - View Dependent Claims (5, 6)
-
-
7. A non-transitory machine-readable medium having program code stored thereon which, when executed by a machine, causes the machine to perform the operations of:
-
providing a base spam filter feature extractor comprising first machine learning logic performing machine learning operations to detect base spam features from a training corpus of ham and spam email messages the base spam filter feature extractor performing the operations of; analyzing an email message to detect whether the email message contains features within a base spam feature set; firing one or more of the base spam features based, at least in part, on the analysis; assigning a weight to each of the base spam features according to how well the base spam features correctly differentiate between ham and spam email messages; providing an auxiliary obfuscation model feature extractor comprising second machine learning logic performing machine learning operations to detect text obfuscation within the training corpus of ham and spam email messages, the auxiliary obfuscation model feature extractor comprising an obfuscation feature set for detecting obfuscation within email messages, the auxiliary obfuscation model feature extractor performing the operations of; analyzing an email message to detect whether the email message contains features within the obfuscation feature set; firing one or more of the obfuscation detection features based, at least in part, on the analysis; assigning a weight to each of the obfuscation detection features according to how well the obfuscation detection features correctly detect text obfuscation in email messages; providing an auxiliary obfuscation detection module to receive an indication of the different sets of features and associated weights detected by the auxiliary obfuscation model feature extractor and to apply the associated weights to the detected features in a stream of incoming email messages; and providing a base spam filter module to receive an indication of the base spam features and associated weights from the base spam filter feature extractor and the weights applied by the auxiliary obfuscation detection module to the stream of incoming email messages, the base spam filter module to apply base spam filter weights to the base spam features detected in the stream of incoming email messages and to determine whether an email message is spam based on a combined weights of the base spam features and the weights of the obfuscation features applied by the auxiliary obfuscation detection module; summing the weights of the base spam features and the weights of features applied by the auxiliary obfuscation model feature extractor to generate a spam score; and identifying the email message as spam if the spam score is above a specified threshold value. - View Dependent Claims (8, 9)
-
Specification