System for email processing and analysis
First Claim
1. A computer-implemented method for analyzing an email message to determine the likelihood that the email message is spam, the method comprising:
- (a) for each of a plurality of message attributes, determining a message attribute occurrence frequency within a collection of known spam email messages, the message attribute occurrence frequency specifying how often one of the plurality of message attributes occurs in the collection of known spam email messages;
(b) for each of the plurality of message attributes, determining a message attribute occurrence frequency within a collection of known non-spam email messages;
(c) determining a spam probability weight for each of the plurality of message attributes based at least in part on the determined message attribute occurrence frequencies determined at steps (a) and (b),wherein step (c) comprises, when determining the spam probability weight for each of the plurality of message attributes,(c.1) weighting more highly newer email messages within the collection of known spam email messages than older email messages within the collection of known spam email messages, and(c.2) weighting more highly newer email messages within the collection of known non-spam email message than older email messages within the collection of known non-spam email messages,wherein the weighting more highly newer email messages within the collection of known non-spam email messages at (c.2) is to a lesser extent than the weighting more highly newer email messages within the collection of known spam messages at (c.1);
(d) detecting a further email message;
(e) identifying which, if any, of the message attributes are associated with the further email message; and
(f) determining a value indicative of a likelihood that the further email message is spam based at least in part on the message attributes identified at step (e) as being associated with the further email message and spam probability weights corresponding to the identified message attributes;
wherein at least step (f) is performed using a hardware processor.
2 Assignments
0 Petitions
Accused Products
Abstract
Various features are provided for analyzing and processing email messages including determining if an email message is unwanted, and blocking unwanted messages. Email traffic is monitored by analyzing email messages addressed to known invalid email addresses. Email messages addressed to invalid email addresses are sent to a central control site for analysis. One embodiment tries to ensure that the distance between the invalid addresses and closest valid addresses is significant enough so that the invalid addresses are not inadvertently used for non-spam purposes. Another embodiment of the invention provides for distributed “thin client” processes to run on computer systems or other processing platforms. The thin clients emulate an open relay computer. Attempts at exploiting the apparent open relay computer are reported to a control center and the relay of email messages can be inhibited. Another embodiment provides for analysis and tuning of rules to detect spam and legitimate email. The approach adjusts various factors according to changing, current email data that is gathered from present, or recent, email traffic. Another embodiment takes into account statistics of erroneous and intentional misspellings. Groups of similar content items (e.g., words, phrases, images, ASCII text, etc.) are correlated and analysis can proceed after substitution of items in the group with other items in the group so that a more accurate detection of “sameness” of content can be achieved. Another embodiment uses authentication and security methods for validating email senders, detecting the sameness of messages, tracking the reputation of the sender, and tracking the behavior of the sender. Another embodiment profiles users to intelligently organize user data, including adapting spam detection according to a user'"'"'s perceived interests.
170 Citations
21 Claims
-
1. A computer-implemented method for analyzing an email message to determine the likelihood that the email message is spam, the method comprising:
-
(a) for each of a plurality of message attributes, determining a message attribute occurrence frequency within a collection of known spam email messages, the message attribute occurrence frequency specifying how often one of the plurality of message attributes occurs in the collection of known spam email messages; (b) for each of the plurality of message attributes, determining a message attribute occurrence frequency within a collection of known non-spam email messages; (c) determining a spam probability weight for each of the plurality of message attributes based at least in part on the determined message attribute occurrence frequencies determined at steps (a) and (b), wherein step (c) comprises, when determining the spam probability weight for each of the plurality of message attributes, (c.1) weighting more highly newer email messages within the collection of known spam email messages than older email messages within the collection of known spam email messages, and (c.2) weighting more highly newer email messages within the collection of known non-spam email message than older email messages within the collection of known non-spam email messages, wherein the weighting more highly newer email messages within the collection of known non-spam email messages at (c.2) is to a lesser extent than the weighting more highly newer email messages within the collection of known spam messages at (c.1); (d) detecting a further email message; (e) identifying which, if any, of the message attributes are associated with the further email message; and (f) determining a value indicative of a likelihood that the further email message is spam based at least in part on the message attributes identified at step (e) as being associated with the further email message and spam probability weights corresponding to the identified message attributes; wherein at least step (f) is performed using a hardware processor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A non-transitory machine-readable storage medium including instructions executable by a processor for analyzing an email message to determine the likelihood that the email message is spam, the machine-readable storage medium comprising:
-
one or more instructions for determining a message attribute occurrence frequency within a collection of known spam email messages for each of a plurality of message attributes, the message attribute occurrence frequency specifying how often one of the plurality of message attributes occurs in the collection of known spam email messages; one or more instructions for determining a message attribute occurrence frequency within a collection of known non-spam email messages for each of the plurality of message attributes; one or more instructions for determining a spam probability weight for the message attributes based at least in part on the determined message attribute occurrence frequencies determined for each of the plurality of message attributes within the collection of known spam messages and within the collection of known non-spam messages, wherein the one or more instructions for determining the spam probability weight for each of the plurality of message attributes comprises one or more instructions for, weighting more highly newer email messages within the collection of known spam email messages than older email messages within the collection of known spam email messages, and weighting more highly newer email messages within the collection of known non-spam email messages than older email messages within the collection of known non-spam email messages, wherein the weighting more highly newer email messages within the collection of known non-spam email messages is to a lesser extent than the weighting more highly newer email messages within the collection of known spam messages; one or more instructions for detecting a further email message; one or more instructions for identifying which, if any, of the message attributes are associated with the further email message; and one or more instructions for determining a value indicative of a likelihood that the further email message is spam, based at least in part on the identified message attributes for the further email message and spam probability weights corresponding to the identified message attributes. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A system for analyzing an email message to determine the likelihood that the email message is spam, the system comprising at least one processor and memory, and a non-transitory computer-readable medium having thereon one or more programs, which when executed by the at least on processor, cause the system to:
-
determine a message attribute occurrence frequency within a collection of spam email messages for each of a plurality of message attributes, the message attribute occurrence frequency specifying how often one of the plurality of message attributes occurs in the collection of known spam email messages rather than how often the one of the plurality of message attributes occurs in an individual spam email message; determine a message attribute occurrence frequency within a collection of non-spam email messages for each of the plurality of message attributes; determine a spam probability weight for each of the plurality of message attributes based at least in part on the determined message attribute occurrence frequencies determined for each of the plurality of message attributes, wherein determine a spam probability weight for each of the plurality of message attributes weights more highly newer email messages within the collection of known spam email messages than older email messages within the collection of known spam email messages, and weights more highly newer email messages within the collection of known non-spam email messages than older email messages within the collection of known non-spam email messages, but to a lesser extent than newer email messages are weighted more highly within the collection of known spam email messages; detect a further email message; identify which, if any, of the message attributes are associated with the further email message; and determine a value indicative of a likelihood that the further email message is spam based at least in part on the identified message attributes and for the further email message corresponding spam probability weights corresponding to identified message attributes.
-
Specification