Fuzzy logic voting method and system for classifying e-mail using inputs from multiple spam classifiers
First Claim
1. A method for classifying an e-mail message received over a digital communications network as unwanted junk e-mail or spam, comprising:
- accessing an output from a first e-mail classification tool and an output from a second e-mail classification tool differing from the first e-mail classification tool, wherein the outputs are indicative of whether the e-mail message is spam and differ in format;
converting the outputs from the first and second e-mail classification tools into first and second standardized outputs, respectively, having a predetermined standardized numerical format;
generating a single classification output by combining the first and second standardized outputs; and
providing the single classification output to a comparator for comparison with a spam threshold value for determining whether the e-mail message corresponding to the single classification output is spam.
16 Assignments
0 Petitions
Accused Products
Abstract
A method, and corresponding system, for identifying e-mail messages as being unwanted junk or spam. The method includes converting the outputs of a set of e-mail classification tools into a standardized format, such as a probability having a value between zero and one. The standardized outputs of the classification tools are then input to a voting mechanism which uses a voting algorithm based on fuzzy logic to combine the standardized outputs into a single classification result. The use of a fuzzy logic algorithm creates a more useful result as the classifier results are not merely averaged. In one embodiment, the single classification result is itself a probability that is provided to a spam classifier or comparator that functions to compare the single classification result to a spam threshold value and based on the comparison to classify the e-mail message as spam or not spam.
-
Citations
20 Claims
-
1. A method for classifying an e-mail message received over a digital communications network as unwanted junk e-mail or spam, comprising:
-
accessing an output from a first e-mail classification tool and an output from a second e-mail classification tool differing from the first e-mail classification tool, wherein the outputs are indicative of whether the e-mail message is spam and differ in format; converting the outputs from the first and second e-mail classification tools into first and second standardized outputs, respectively, having a predetermined standardized numerical format; generating a single classification output by combining the first and second standardized outputs; and providing the single classification output to a comparator for comparison with a spam threshold value for determining whether the e-mail message corresponding to the single classification output is spam. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A voting method for use in combining outputs of two or more outputs from e-mail classification tools, comprising:
-
retrieving a first classification output corresponding to a classification process performed by a first e-mail classifier on an e-mail; retrieving a second classification output corresponding to a classification process performed by a second e-mail classifier on the e-mail; and generating a combined e-mail classification result by inputting the first and second classification outputs into a voting formula comprising;
Pcombined=(P1×
P2)/((P1×
P2)+(1−
P1)(1−
P2))wherein Pcombined is the combined e-mail classification result, P1 is the first classification output, and P2 is the second classification output and wherein the combined e-mail classification result, the first classification output, and the second classification outputs have values between 0 and 1. - View Dependent Claims (10, 11, 12, 13, 14)
-
-
15. An e-mail handling system, comprising:
-
a set of classification tools for processing an e-mail message and generating a set of classification results indicating whether the tools determined the e-mail message to be spam, the classification results comprising at least two formats; a conversion mechanism processing the classification results to convert each of the classification results into a predetermined standardized format, wherein the predetermined standardized format comprises a probability indicating a likelihood the e-mail message is spam; and a voting mechanism operating to input the standardized classification results as input to a voting formula to generate a combined classification output comprising a probability that the e-mail message is spam. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification