Detecting spam email using multiple spam classifiers
First Claim
1. A method of detecting whether a first e-mail is undesirable, the method comprising:
- inputting the first e-mail to each of a plurality of constituent spam classifiers;
obtaining at least one score from each of the plurality of constituent spam classifiers indicating the degree to which the first e-mail is deemed spam;
obtaining a combined spam score from a combined spam classifier that takes as input the at least one score from the plurality of constituent spam classifiers, the combined spam classifier being computed automatically in accordance with a false-positive vs. false-negative tradeoff; and
identifying the first e-mail as an undesirable e-mail if the combined spam score indicates that the first e-mail is undesirable;
wherein step of computing the combined spam classifier comprises;
compiling a labeled e-mail corpus consisting of a plurality of e-mails that have been labeled according to the degree to which the plurality of e-mails are deemed to be spam;
computing scores of the plurality of constituent spam classifiers on each e-mail in the labeled e-mail corpus;
establishing a set of one or more sample false-positive vs. false-negative tradeoffs;
analyzing, for each sample false-positive vs. false-negative tradeoff, the computed scores of the plurality of constituent spam classifiers on each e-mail in the labeled e-mail corpus to compute a set of combined spam classifiers, each of which best achieves a corresponding sample false-positive vs. false-negative tradeoff;
selecting a false-positive vs. false-negative tradeoff; and
computing from the false-positive vs. false-negative tradeoff, a set of sample false-positive vs. false-negative tradeoffs and a set of corresponding best combined classifiers a best combined classifier for the false-positive vs. false-negative tradeoff, and wherein the false-positive vs. false-negative tradeoffs are specified by penalty functions, and the combined spam classifier associated with a given penalty function is computed by an optimization procedure that yields the combined spam classifier for which the value of the given penalty function is minimal on the labeled e-mail corpus.
0 Assignments
0 Petitions
Accused Products
Abstract
A method for detecting undesirable emails combines input from two or more spam classifiers to provide improved classification effectiveness and robustness. The method includes obtaining a score from each of a plurality of constituent spam classifiers by applying them to a given input email. The method further includes obtaining a combined spam score from a combined spam classifier that takes as input the plurality of constituent spam classifier scores, the combined spam classifier being computed automatically in accordance with a specified false-positive vs. false-negative tradeoff. The method further includes identifying the given input email as an undesirable email if the combined spam score indicates that the input e-mail is undesirable.
11 Citations
18 Claims
-
1. A method of detecting whether a first e-mail is undesirable, the method comprising:
- inputting the first e-mail to each of a plurality of constituent spam classifiers;
obtaining at least one score from each of the plurality of constituent spam classifiers indicating the degree to which the first e-mail is deemed spam;
obtaining a combined spam score from a combined spam classifier that takes as input the at least one score from the plurality of constituent spam classifiers, the combined spam classifier being computed automatically in accordance with a false-positive vs. false-negative tradeoff; and
identifying the first e-mail as an undesirable e-mail if the combined spam score indicates that the first e-mail is undesirable;
wherein step of computing the combined spam classifier comprises;
compiling a labeled e-mail corpus consisting of a plurality of e-mails that have been labeled according to the degree to which the plurality of e-mails are deemed to be spam;
computing scores of the plurality of constituent spam classifiers on each e-mail in the labeled e-mail corpus;
establishing a set of one or more sample false-positive vs. false-negative tradeoffs;
analyzing, for each sample false-positive vs. false-negative tradeoff, the computed scores of the plurality of constituent spam classifiers on each e-mail in the labeled e-mail corpus to compute a set of combined spam classifiers, each of which best achieves a corresponding sample false-positive vs. false-negative tradeoff;
selecting a false-positive vs. false-negative tradeoff; and
computing from the false-positive vs. false-negative tradeoff, a set of sample false-positive vs. false-negative tradeoffs and a set of corresponding best combined classifiers a best combined classifier for the false-positive vs. false-negative tradeoff, and wherein the false-positive vs. false-negative tradeoffs are specified by penalty functions, and the combined spam classifier associated with a given penalty function is computed by an optimization procedure that yields the combined spam classifier for which the value of the given penalty function is minimal on the labeled e-mail corpus. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
- inputting the first e-mail to each of a plurality of constituent spam classifiers;
-
16. A method for detecting undesirable e-mail, the method comprising:
- inputting a first e-mail to each of a plurality of constituent spam classifiers;
obtaining at least one score from each of the plurality of constituent spam classifiers indicating the degree to which the first e-mail is deemed spam;
obtaining a combined spam score from a combined spam classifier that takes as input the at least one score from each of the plurality of constituent spam classifiers, at least one of the plurality of constituent spam classifiers being a member of a similarity-detection family; and
identifying the first e-mail as an undesirable e-mail if the combined spam score indicates that the first e-mail is undesirable;
wherein step of computing the combined spam classifier comprises;
compiling a labeled e-mail corpus consisting of a plurality of e-mails that have been labeled according to the degree to which the plurality of e-mails are deemed to be spam;
computing scores of the plurality of constituent spam classifiers on each e-mail in the labeled e-mail corpus;
establishing a set of one or more sample false-positive vs. false-negative tradeoffs;
analyzing, for each sample false-positive vs. false-negative tradeoff, the computed scores of the plurality of constituent spam classifiers on each e-mail in the labeled e-mail corpus to compute a set of combined spam classifiers, each of which best achieves a corresponding sample false-positive vs. false-negative tradeoff;
selecting a false-positive vs. false-negative tradeoff; and
computing from the false-positive vs. false-negative tradeoff, a set of sample false-positive vs. false-negative tradeoffs and a set of corresponding best combined classifiers a best combined classifier for the false-positive vs. false-negative tradeoff, and wherein the false-positive vs. false-negative tradeoffs are specified by penalty functions, and the combined spam classifier associated with a given penalty function is computed by an optimization procedure that yields the combined spam classifier for which the value of the given penalty function is minimal on the labeled e-mail corpus.
- inputting a first e-mail to each of a plurality of constituent spam classifiers;
-
17. A non-transitory computer readable medium including computer instructions for detecting whether a first e-mail is undesirable, the computer instructions including instructions for:
- inputting the first e-mail to each of a plurality of constituent spam classifiers;
obtaining at least one score from each of the plurality of constituent spam classifiers indicating the degree to which the first e-mail is deemed spam;
obtaining a combined spam score from a combined spam classifier that takes as input the at least one score from the plurality of constituent spam classifiers, the combined spam classifier being computed automatically in accordance with a false-positive vs. false-negative tradeoff; and
identifying the first e-mail as an undesirable e-mail if the combined spam score indicates that the first e-mail is undesirable;
wherein step of computing the combined spam classifier comprises;
compiling a labeled e-mail corpus consisting of a plurality of e-mails that have been labeled according to the degree to which the plurality of e-mails are deemed to be spam;
computing scores of the plurality of constituent spam classifiers on each e-mail in the labeled e-mail corpus;
establishing a set of one or more sample false-positive vs. false-negative tradeoffs;
analyzing, for each sample false-positive vs. false-negative tradeoff, the computed scores of the plurality of constituent spam classifiers on each e-mail in the labeled e-mail corpus to compute a set of combined spam classifiers, each of which best achieves a corresponding sample false-positive vs. false-negative tradeoff;
selecting a false-positive vs. false-negative tradeoff; and
computing from the false-positive vs. false-negative tradeoff, a set of sample false-positive vs. false-negative tradeoffs and a set of corresponding best combined classifiers a best combined classifier for the false-positive vs. false-negative tradeoff, and wherein the false-positive vs. false-negative tradeoffs are specified by penalty functions, and the combined spam classifier associated with a given penalty function is computed by an optimization procedure that yields the combined spam classifier for which the value of the given penalty function is minimal on the labeled e-mail corpus.
- inputting the first e-mail to each of a plurality of constituent spam classifiers;
-
18. An information processing system for detecting whether a first e-mail is undesirable, comprising:
- a processor configured for;
inputting the first e-mail to each of a plurality of constituent spam classifiers;
obtaining at least one score from each of the plurality of constituent spam classifiers indicating the degree to which the first e-mail is deemed spam;
obtaining a combined spam score from a combined spam classifier that takes as input the at least one score from the plurality of constituent spam classifiers, the combined spam classifier being computed automatically in accordance with a false-positive vs. false-negative tradeoff; and
identifying the first e-mail as an undesirable e-mail if the combined spam score indicates that the first e-mail is undesirable;
wherein step of computing the combined spam classifier comprises;
compiling a labeled e-mail corpus consisting of a plurality of e-mails that have been labeled according to the degree to which the plurality of e-mails are deemed to be spam;
computing scores of the plurality of constituent spam classifiers on each e-mail in the labeled e-mail corpus;
establishing a set of one or more sample false-positive vs. false-negative tradeoffs;
analyzing, for each sample false-positive vs. false-negative tradeoff, the computed scores of the plurality of constituent spam classifiers on each e-mail in the labeled e-mail corpus to compute a set of combined spam classifiers, each of which best achieves a corresponding sample false-positive vs. false-negative tradeoff;
selecting a false-positive vs. false-negative tradeoff; and
computing from the false-positive vs. false-negative tradeoff, a set of sample false-positive vs. false-negative tradeoffs and a set of corresponding best combined classifiers a best combined classifier for the false-positive vs. false-negative tradeoff, and wherein the false-positive vs. false-negative tradeoffs are specified by penalty functions, and the combined spam classifier associated with a given penalty function is computed by optimization procedure that yields the combined spam classifier for which the value of the given penalty function is minimal on the labeled e-mail corpus.
- a processor configured for;
Specification