Recognizing spam email
First Claim
1. A method for estimating a probability of spamminess for email messages, comprising:
- using a processor device, performing steps of;
reviewing a sequence of IP addresses used in transmission of an email message, beginning with a last IP address and proceeding backward to an originating IP address, with intermediate IP address in between;
comparing a current IP address in the sequence of IP addresses with an IP address in a training set;
when the current IP address does not match any IP address in the training set, combining statistics of nearby IP addresses by;
building a tree of known IP addresses, wherein a root of said tree has up to 256 first level sub trees, each first level sub tree corresponding to various possible first bytes of an IP address;
wherein each node n in the tree represents an IP address; and
at each node n in the tree;
storing a count Sn of spam messages in which the IP address the node n represents has appeared;
storing a count NSn of non-spam messages in which the IP address the node n represents has appeared; and
computing a ratio that is a measure of spaminess s of the node n;
wherein the ratio of the measure of spaminess s of the node n is computed by dividing the total number of messages that have come through the address, which is Sn/(Sn+NSn);
determining an overall score for the message based on the ratio of spaminess of each node along the message path;
wherein the overall score is calculated by calculating a weighted average of the spaminess s of the nodes, with the weight equal to 1/(s*(1−
s); and
determining a probability that the message is spam based on the overall score being greater than a defined spam threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
A system includes at least one router for routing email messages from a sender node to a destination node; a system memory; a network interface; a database; and a processor configured for: extracting the delivery path information from the email message; determining a network path for the email message using delivery path information; comparing the delivery path information with the plurality of prior delivery paths; determining a measure of similarity between the network path of the received email message and one or more of the plurality of prior email paths; and determining a spam score for the email received, based on the measure of similarity.
11 Citations
17 Claims
-
1. A method for estimating a probability of spamminess for email messages, comprising:
using a processor device, performing steps of; reviewing a sequence of IP addresses used in transmission of an email message, beginning with a last IP address and proceeding backward to an originating IP address, with intermediate IP address in between; comparing a current IP address in the sequence of IP addresses with an IP address in a training set; when the current IP address does not match any IP address in the training set, combining statistics of nearby IP addresses by; building a tree of known IP addresses, wherein a root of said tree has up to 256 first level sub trees, each first level sub tree corresponding to various possible first bytes of an IP address; wherein each node n in the tree represents an IP address; and at each node n in the tree; storing a count Sn of spam messages in which the IP address the node n represents has appeared; storing a count NSn of non-spam messages in which the IP address the node n represents has appeared; and computing a ratio that is a measure of spaminess s of the node n; wherein the ratio of the measure of spaminess s of the node n is computed by dividing the total number of messages that have come through the address, which is Sn/(Sn+NSn); determining an overall score for the message based on the ratio of spaminess of each node along the message path; wherein the overall score is calculated by calculating a weighted average of the spaminess s of the nodes, with the weight equal to 1/(s*(1−
s); anddetermining a probability that the message is spam based on the overall score being greater than a defined spam threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
8. An information processing system for estimating a probability of spamminess for email messages, comprising:
-
a system memory; a processor device operably coupled with the system memory and performing steps of; reviewing a sequence of IP addresses used in transmission of an email message, beginning with a last IP address and proceeding backward to an originating IP address, with intermediate IP address in between; comparing a current IP address in the sequence of IP addresses with an IP address in a training set; when the current IP address does not match any IP address in the training set, combining statistics of nearby IP addresses by; building a tree of known IP addresses, wherein a root of said tree has up to 256 first level sub trees, each first level sub tree corresponding to various possible first bytes of an IP address; wherein each node n in the tree represents an IP address; and at each node n in the tree; storing a count Sn of spam messages in which the IP address the node n represents has appeared; storing a count NSn of non-spam messages in which the IP address the node n represents has appeared; and computing a ratio that is a measure of spamminess s of the node n; wherein the ratio of the measure of seaminess s of the node n is computed by dividing the total number of messages that have come through the address, which is Sn/(Sn+NSn); determining an overall score for the message based on the ratio of seaminess of each node along the message path; wherein the overall score is calculated by calculating a weighted average of the seaminess s of the nodes, with the weight equal to 1/(s*(1−
s); anddetermining a probability that the message is spam based on the overall score being greater than a defined spam threshold. - View Dependent Claims (9, 10, 11, 12, 13)
-
-
14. A non-transitory computer readable storage medium comprising program code for estimating a probability of spamminess for email messages that, when executed, enables a computer to perform steps of:
-
reviewing a sequence of IP addresses used in transmission of an email message, beginning with a last IP address and proceeding backward to an originating IP address, with intermediate IP address in between; comparing a current IP address in the sequence of IP addresses with an IP address in a training set; when the current IP address does not match any IP address in the training set, combining statistics of nearby IP addresses by; building a tree of known IP addresses, wherein a root of said tree has up to 256 first level sub trees, each first level sub tree corresponding to various possible first bytes of an IP address; wherein each node n in the tree represents an IP address; and at each node n in the tree; storing a count Sn of spam messages in which the IP address the node n represents has appeared; storing a count NSn of non-spam messages in which the IP address the node n represents has appeared; and computing a ratio that is a measure of spaminess s of the node n; wherein the ratio of the measure of spaminess s of the node n is computed by dividing the total number of messages that have come through the address, which is Sn/(Sn+NSn); determining an overall score for the message based on the ratio of spaminess of each node along the message path; wherein the overall score is calculated by calculating a weighted average of the seaminess s of the nodes, with the weight equal to 1/(s*(1−
s); anddetermining a probability that the message is spam based on the overall score being greater than a defined spam threshold. - View Dependent Claims (15, 16, 17)
-
Specification