Techniques for identifying spam e-mail
First Claim
1. A method of identifying spam e-mail, the method to be performed by a computer having a processor and memory, the memory comprising computer-readable program code configured to be executed by the processor so the computer can perform the method, the method comprising:
- the computer identifying one or more of the tokens in an e-mail, wherein the tokens are selected for use in identifying spam e-mail, and wherein weights of the tokens, weights of heuristic rules that describe relationships between the tokens, and a threshold are computed using a support vector machine;
the computer using identified tokens to determine if the e-mail satisfies one or more of the heuristic rules;
the computer using weights of the identified tokens and a weight of a satisfied heuristic rule to generate a spam score; and
the computer comparing the spam score to the threshold to determine if the e-mail is a spam e-mail or a legitimate e-mail.
1 Assignment
0 Petitions
Accused Products
Abstract
In one embodiment, a support vector machine is employed to compute a spam threshold and weights of tokens and heuristic rules. An incoming e-mail is parsed to determine if it contains one or more of the tokens. Tokens identified to be in the e-mail are then used to determine if the e-mail satisfies one or more heuristic rules. The weights of tokens found in the e-mail and the weights of the heuristic rules satisfied by the e-mail may be employed in the computation of a spam score. The spam score may be compared to the spam threshold to determine if the e-mail is spam or legitimate.
40 Citations
20 Claims
-
1. A method of identifying spam e-mail, the method to be performed by a computer having a processor and memory, the memory comprising computer-readable program code configured to be executed by the processor so the computer can perform the method, the method comprising:
-
the computer identifying one or more of the tokens in an e-mail, wherein the tokens are selected for use in identifying spam e-mail, and wherein weights of the tokens, weights of heuristic rules that describe relationships between the tokens, and a threshold are computed using a support vector machine; the computer using identified tokens to determine if the e-mail satisfies one or more of the heuristic rules; the computer using weights of the identified tokens and a weight of a satisfied heuristic rule to generate a spam score; and the computer comparing the spam score to the threshold to determine if the e-mail is a spam e-mail or a legitimate e-mail. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer comprising computer-readable storage medium and a processor, the processor being configured to execute computer-readable program code in the computer-readable storage medium, the computer readable storage medium comprising:
-
a token parser comprising computer-readable program code configured for execution by the processor to identify tokens present in an incoming e-mail; a heuristic rule engine comprising computer-readable program code configured for execution by the processor to determine if the tokens identified by the token parser satisfy a heuristic rule; and a classifier comprising computer-readable program code configured for execution by the processor to determine whether or not the e-mail is spam based on weights assigned to identified tokens and a weight of a satisfied heuristic rule, the weights of the tokens and the weight of the satisfied heuristic rule being computed using a support vector machine. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A method of identifying spam e-mail, the method to be performed by a computer having a processor and memory, the memory comprising computer-readable program code configured to be executed by the processor so the computer can perform the method, the method comprising:
-
the computer selecting tokens among a plurality of tokens for use in identifying spam e-mail; the computer creating heuristic rules describing relationships between the selected tokens; the computer creating a training data matrix having a plurality of rows, each of the rows having components that correspond to a selected token or a heuristic rule; the computer inputting the training data matrix to a support vector machine to compute weights of the selected tokens and weights of the heuristic rules, wherein the weights of selected tokens and the weights of the heuristic rules are employed to identify spam e-mails. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification