Adaptive threshold based spam classification
First Claim
1. A computer implemented method for using a dynamically adaptive decision threshold for detecting spam during a current time period, the method comprising:
- using a computer to perform steps comprising;
calculating an estimated spam email occurrence probability for the current time period according to a statistical time series prediction methodology taking into account a previous ratio between a number of emails received in at least one previous time period adjudicated to be clean using a fixed decision threshold and a number of emails received in at least one previous time period adjudicated to be spam using the fixed decision threshold;
calculating an adaptive decision threshold to use for the current time period to adjudicate whether emails received during the current time period are spam, the adaptive decision threshold being based on a misclassification cost ratio and the estimated spam email occurrence probability;
determining, for each email received during the current time period, a likelihood of the email being spam;
adjudicating whether each email received during the current time period is spam by comparing the determined likelihood of the email received being spam to the adaptive decision threshold;
comparing, for each email received during the current time period, the determined likelihood of the email being spam to the fixed decision threshold to determine a number of emails received in the current time period adjudicated as spam and a number of emails received in the current time period adjudicated as clean; and
calculating a current ratio between the number of emails received in the current time period adjudicated as spam using the fixed decision threshold and the number of emails received in the current time period adjudicated as clean using the fixed decision threshold, the current ratio used to calculate an adaptive decision threshold for a future time period.
2 Assignments
0 Petitions
Accused Products
Abstract
A spam classification manager uses a dynamically adaptive decision threshold for detecting spam email messages. For each of a plurality of time periods, the spam classification manager calculates an adaptive decision threshold to use to adjudicate whether or not received email messages comprise spam. The threshold is based on ratios between clean and spam emails received in previous time periods, as well as a misclassification cost ratio. The spam classification manager determines a likelihood of each incoming email message received during the time period being spam, and adjudicates whether each message in fact comprises spam by comparing the determined likelihood to the threshold. The spam classification manager keeps track of incoming email messages received during the time period adjudicated to be spam and adjudicated to be clean, and uses that information in the calculation of adaptive thresholds for future time periods.
75 Citations
11 Claims
-
1. A computer implemented method for using a dynamically adaptive decision threshold for detecting spam during a current time period, the method comprising:
using a computer to perform steps comprising; calculating an estimated spam email occurrence probability for the current time period according to a statistical time series prediction methodology taking into account a previous ratio between a number of emails received in at least one previous time period adjudicated to be clean using a fixed decision threshold and a number of emails received in at least one previous time period adjudicated to be spam using the fixed decision threshold; calculating an adaptive decision threshold to use for the current time period to adjudicate whether emails received during the current time period are spam, the adaptive decision threshold being based on a misclassification cost ratio and the estimated spam email occurrence probability; determining, for each email received during the current time period, a likelihood of the email being spam; adjudicating whether each email received during the current time period is spam by comparing the determined likelihood of the email received being spam to the adaptive decision threshold; comparing, for each email received during the current time period, the determined likelihood of the email being spam to the fixed decision threshold to determine a number of emails received in the current time period adjudicated as spam and a number of emails received in the current time period adjudicated as clean; and calculating a current ratio between the number of emails received in the current time period adjudicated as spam using the fixed decision threshold and the number of emails received in the current time period adjudicated as clean using the fixed decision threshold, the current ratio used to calculate an adaptive decision threshold for a future time period. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
8. A computer readable storage medium containing a computer program product for using a dynamically adaptive decision threshold for detecting spam during a current time period, the computer program product comprising:
- program code for performing the following steps;
calculating an estimated spam email occurrence probability for the current time period according to a statistical time series prediction methodology taking into account a previous ratio between a number of emails received in at least one previous time period adjudicated to be clean using a fixed decision threshold and a number of emails received in at least one previous time period adjudicated to be spam using the fixed decision threshold; calculating an adaptive decision threshold to use for the current time period to adjudicate whether emails received during the current time period are spam, the adaptive decision threshold being based on a misclassification cost ratio and the estimated spam email occurrence probability determining, for each email received during the current time period, a likelihood of the email being spam; adjudicating whether each email received during the current time period is spam by comparing the determined likelihood of the email received being spam to the adaptive decision threshold; comparing, for each email received during the current time period, the determined likelihood of the email being spam to the fixed decision threshold to determine a number of emails received in the current time period adjudicated as spam and a number of emails received in the current time period adjudicated as clean; and calculating a current ratio between the number of emails received in the current time period adjudicated as spam using the fixed decision threshold and the number of emails received in the current time period adjudicated as clean using the fixed decision threshold, the current ratio used to calculate an adaptive decision threshold for a future time period. - View Dependent Claims (9)
- program code for performing the following steps;
-
10. A computer system having a computer readable storage medium having computer program instructions embodied therein for using a dynamically adaptive decision threshold for detecting spam during a current time period, the computer program instructions comprising:
a plurality of software portions configured to perform the following steps; calculating an estimated spam email occurrence probability for the current time period according to a statistical time series prediction methodology taking into account a previous ratio between a number of emails received in at least one previous time period adjudicated to be clean using a fixed decision threshold and a number of emails received in at least one previous time period adjudicated to be spam using the fixed decision threshold; calculating an adaptive decision threshold to use for the current time period to adjudicate whether emails received during the current time period are spam, the adaptive decision threshold being based on a misclassification cost ratio and the estimated spam email occurrence probability; determining, for each email received during the current time period, a likelihood of the email being spam; adjudicating whether each email received during the current time period is spam by comparing the determined likelihood of the email received being spam to the adaptive decision threshold; comparing, for each email received during the current time period, the determined likelihood of the email being spam to the fixed decision threshold to determine a number of emails received in the current time period adjudicated as spam and a number of emails received in the current time period adjudicated as clean; and calculating a current ratio between the number of emails received in the current time period adjudicated as spam using the fixed decision threshold and the number of emails received in the current time period adjudicated as clean using the fixed decision threshold, the current ratio used to calculate an adaptive decision threshold for a future time period. - View Dependent Claims (11)
Specification