Fuzzy logic voting method and system for classifying e-mail using inputs from multiple spam classifiers

US 7,051,077 B2
Filed: 06/22/2004
Issued: 05/23/2006
Est. Priority Date: 06/30/2003
Status: Active Grant

First Claim

Patent Images

1. A method for classifying an e-mail message received over a digital communications network as unwanted junk e-mail or spam, comprising:

accessing an output from a first e-mail classification tool and an output from a second e-mail classification tool differing from the first e-mail classification tool, wherein the outputs are indicative of whether the e-mail message is spam and differ in format;

converting the outputs from the first and second e-mail classification tools into first and second standardized outputs, respectively, having a predetermined standardized numerical format;

generating a single classification output by combining the first and second standardized outputs; and

providing the single classification output to a comparator for comparison with a spam threshold value for determining whether the e-mail message corresponding to the single classification output is spam.

View all claims

16 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A method, and corresponding system, for identifying e-mail messages as being unwanted junk or spam. The method includes converting the outputs of a set of e-mail classification tools into a standardized format, such as a probability having a value between zero and one. The standardized outputs of the classification tools are then input to a voting mechanism which uses a voting algorithm based on fuzzy logic to combine the standardized outputs into a single classification result. The use of a fuzzy logic algorithm creates a more useful result as the classifier results are not merely averaged. In one embodiment, the single classification result is itself a probability that is provided to a spam classifier or comparator that functions to compare the single classification result to a spam threshold value and based on the comparison to classify the e-mail message as spam or not spam.

Citations

20 Claims

1. A method for classifying an e-mail message received over a digital communications network as unwanted junk e-mail or spam, comprising:
- accessing an output from a first e-mail classification tool and an output from a second e-mail classification tool differing from the first e-mail classification tool, wherein the outputs are indicative of whether the e-mail message is spam and differ in format;
  
  converting the outputs from the first and second e-mail classification tools into first and second standardized outputs, respectively, having a predetermined standardized numerical format;
  
  generating a single classification output by combining the first and second standardized outputs; and
  
  providing the single classification output to a comparator for comparison with a spam threshold value for determining whether the e-mail message corresponding to the single classification output is spam.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, wherein the single classification output generating comprises inputting the first and second standardized outputs into a voting algorithm, the voting algorithm being based on fuzzy logic not outputting an average of the first and second standardized outputs or using Boolean or conditional logic.
  - 3. The method of claim 2, wherein the single classification output comprises a confidence level greater than either of the first and second standardized outputs when the first and second standardized outputs both indicate the message is spam or is not spam.
  - 4. The method of claim 2, wherein the predetermined standardized numerical format is a decimal probability between 0 and 1 and wherein the voting algorithm comprises an equation including:
    - P_combined=(P₁×
      
      P₂)/((P₁×
      
      P₂)+(1−
      
      P₁)(1−
      
      P₂))wherein P_combinedis the single classification output, P₁is the first standardized output, and P₂is the second standardized output.
  - 5. The method of claim 4, further comprising accessing an output from a third e-mail classification tool differing from the first and second e-mail classification tools, wherein the output from the third tool differs in format from the outputs from the first and second tools, wherein the converting is performed upon the output from the third tool to produce a third standardized output having the predetermined standardized format, and further wherein the single classification output comprises performing the voting algorithm in an iterative fashion with the first, second, and third standardized outputs.
  - 6. The method of claim 1, wherein the converting comprises inputting the output from the first e-mail classification tool into a first conversion algorithm and the output from the second e-mail classification tool into a second conversion algorithm.
  - 7. The method of claim 6, wherein the first and second conversion algorithms each comprise a tuning parameter affecting a value of the first and second standardized outputs and wherein the method further comprises receiving a tuning instruction and altering one of the tuning parameters based on the tuning instruction.
  - 8. The method of claim 6, further comprising accessing an output from a third e-mail classification tool and the converting comprises inputting the output from the third e-mail classification tool into a third conversion algorithm to generate a third standardized output, wherein the generating of the single classification output comprises iteratively combining the first, second, and third standardized outputs.

9. A voting method for use in combining outputs of two or more outputs from e-mail classification tools, comprising:
- retrieving a first classification output corresponding to a classification process performed by a first e-mail classifier on an e-mail;
  
  retrieving a second classification output corresponding to a classification process performed by a second e-mail classifier on the e-mail; and
  
  generating a combined e-mail classification result by inputting the first and second classification outputs into a voting formula comprising;
  
  P_combined=(P₁×
  
  P₂)/((P₁×
  
  P₂)+(1−
  
  P₁)(1−
  
  P₂))wherein P_combinedis the combined e-mail classification result, P₁is the first classification output, and P₂is the second classification output and wherein the combined e-mail classification result, the first classification output, and the second classification outputs have values between 0 and 1.
- View Dependent Claims (10, 11, 12, 13, 14)
- - 10. The method of claim 9, further including retrieving a third classification output corresponding to a classification process performed by a third e-mail classifier on the e-mail and wherein the generating comprises performing the voting formula iteratively on the first, second, and third classification outputs.
  - 11. The method of claim 9, further comprising prior to the generating of the combined e-mail classification result, converting the first and second classification outputs to a standardized form.
  - 12. The method of claim 11 wherein the standardized form comprises a probability.
  - 13. The method of claim 9 further comprising receiving the e-mail, performing the classification process with the first e-mail classifier on the e-mail to generate the first classification output, and performing the classification process with the second e-mail classifier on the e-mail to generate the second classification output.
  - 14. The method of claim 13 further comprising comparing the combined e-mail classification output to a spam threshold value and when the comparing determines the spam threshold value is exceeded, classifying the e-mail as spam.

15. An e-mail handling system, comprising:
- a set of classification tools for processing an e-mail message and generating a set of classification results indicating whether the tools determined the e-mail message to be spam, the classification results comprising at least two formats;
  
  a conversion mechanism processing the classification results to convert each of the classification results into a predetermined standardized format, wherein the predetermined standardized format comprises a probability indicating a likelihood the e-mail message is spam; and
  
  a voting mechanism operating to input the standardized classification results as input to a voting formula to generate a combined classification output comprising a probability that the e-mail message is spam.
- View Dependent Claims (16, 17, 18, 19, 20)
- - 16. The system of claim 15, further comprising a spam classifier comparing the combined classification output with a threshold value and based on the comparing, classifying the e-mail message as spam or as not spam.
  - 17. The system of claim 15, further comprising a tuning module gathering historical data representative of an effectiveness of one of the classification tools and based on the gathered historical data, generating a tuning instruction to the conversion mechanism to modify the conversion processing performed on the classification result corresponding to the one classification tool to alter a value of the standardized classification result for the one classification tool.
  - 18. The system of claim 15, wherein the voting formula comprises:
    - P_combined=(P₁×
      
      P₂)/((P₁×
      
      P₂)+(1−
      
      P₁)(1−
      
      P₂))wherein P_combinedis the combined classification output, P₁is a first one of the standardized classification results, and P₂is a second one of the standardized classification results; and
      
      wherein the standardized classification results are input into the voting formula iteratively with a first pair of the standardized classification results being P₁in the second iteration and a third one of the standardized classification being P₂.
  - 19. The system of claim 15, wherein the set of classification tools comprises at least three differing spam classification devices processing the e-mail message and generating the set of classification results.
  - 20. The system of claim 19, wherein the predetermined standardized format is a decimal probability and the standardized classification results range from greater than zero to less than one.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Musarubra US LLC (Musarubra US SellCo LLC)
Original Assignee
MX Logic Incorporated (McAfee, LLC)
Inventors
Lin, Wei
Primary Examiner(s)
Eng, David Y.

Application Number

US10/873,882
Publication Number

US 20040267893A1
Time in Patent Office

700 Days
Field of Search

709/207, 709/217, 709/203, 709/223, 709/225
US Class Current

709/207
CPC Class Codes

H04L 51/212 using filtering or selectiv...

Fuzzy logic voting method and system for classifying e-mail using inputs from multiple spam classifiers

First Claim

16 Assignments

0 Petitions

Accused Products

Abstract

Citations

20 Claims

Specification

Solutions

Use Cases

Quick Links

Fuzzy logic voting method and system for classifying e-mail using inputs from multiple spam classifiers

First Claim

16 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

20 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links