Detecting unwanted electronic mail messages based on probabilistic analysis of referenced resources
First Claim
1. A method, comprising:
- before receiving any electronic message;
retrieving a whitelist comprising a plurality of first network resource identifiers that are known not to be associated with threats;
retrieving a particular first network resource identifier from the whitelist;
generating a first list of properties for the particular first network resource identifier;
training, using the properties of the first list, a probabilistic filter to recognize future electronic mail messages that are not associated with threats;
repeating the retrieving and training for all the first network resource identifiers in the whitelist;
retrieving a blocklist comprising a plurality of second network resource identifiers that are known to be associated with threats;
retrieving a particular second network resource identifier from the blocklist;
generating a second list of properties for the particular second network resource identifier;
training, using the properties of the second list, the probabilistic filter to recognize future electronic mail messages that are associated with threats;
repeating the retrieving and training for all the second network resource identifiers in the blocklist;
wherein the network resource identifiers are uniform resource locators (URLs);
wherein generating properties comprises obtaining information from “
whois”
queries, based on a domain name owner for a domain name contained in the particular first network resource identifier or the particular second network resource identifier.
1 Assignment
0 Petitions
Accused Products
Abstract
In one embodiment, detecting unwanted electronic mail messages based on probabilistic analysis of referenced resources comprises receiving a whitelist and a blocklist each having a plurality of network resource identifiers that have appeared in prior messages; retrieving a particular network resource identifier; generating a list of properties for the particular network resource identifier; training a probabilistic filter using the properties; and repeating the retrieving, generating and training for all the network resource identifiers in the whitelist and blocklist. Thereafter, when an electronic mail message is received and contains a URL or other network resource identifier, a spam score or threat score can be generated for the message by testing properties of the network resource identifier using the trained probabilistic filter.
-
Citations
31 Claims
-
1. A method, comprising:
before receiving any electronic message; retrieving a whitelist comprising a plurality of first network resource identifiers that are known not to be associated with threats; retrieving a particular first network resource identifier from the whitelist; generating a first list of properties for the particular first network resource identifier; training, using the properties of the first list, a probabilistic filter to recognize future electronic mail messages that are not associated with threats; repeating the retrieving and training for all the first network resource identifiers in the whitelist; retrieving a blocklist comprising a plurality of second network resource identifiers that are known to be associated with threats; retrieving a particular second network resource identifier from the blocklist; generating a second list of properties for the particular second network resource identifier; training, using the properties of the second list, the probabilistic filter to recognize future electronic mail messages that are associated with threats; repeating the retrieving and training for all the second network resource identifiers in the blocklist; wherein the network resource identifiers are uniform resource locators (URLs); wherein generating properties comprises obtaining information from “
whois”
queries, based on a domain name owner for a domain name contained in the particular first network resource identifier or the particular second network resource identifier.- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
9. A computer-readable tangible volatile or non-volatile storage medium storing one or more sequences of instructions which, when executed by one or more processors, cause the one or more processors to perform:
before receiving an electronic message; retrieving a whitelist comprising a plurality of first network resource identifiers that are known not to be associated with threats; retrieving a particular first network resource identifier from the whitelist; generating a first list of properties for the particular first network resource identifier; training, using the properties of the first list, a probabilistic filter to recognize future electronic mail messages that are not associated with threats; repeating the retrieving and training for all the first network resource identifiers in the whitelist; retrieving a blocklist comprising a plurality of second network resource identifiers that are known to be associated with threats; retrieving a particular second network resource identifier from the blocklist; generating a second list of properties for the particular second network resource identifier; training, using the properties of the second list, the probabilistic filter to recognize future electronic mail messages that are associated with threats; repeating the retrieving and training for all the second network resource identifiers in the blocklist; wherein the network resource identifiers are uniform resource locators (URLs); wherein generating properties comprises obtaining information from “
whois”
queries, based on a domain name owner for a domain name contained in the particular first network resource identifier or the particular second network resource identifier.- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
17. An apparatus, comprising:
-
one or more processors; means for retrieving, before receiving any electronic message, a whitelist comprising a plurality of first network resource identifiers that are known not to be associated with threats; means for retrieving a particular first network resource identifier from the whitelist; means for generating a first list of properties for the particular first network resource identifier; means for training, using the properties of the first list, a probabilistic filter to recognize future electronic mail messages that are not associated with threats; means for repeating execution of the retrieving and training means for all the first network resource identifiers in the whitelist; means for retrieving a blocklist comprising a plurality of second network resource identifiers that are known to be associated with threats; means for retrieving a particular second network resource identifier from the blocklist; means for generating a second list of properties for the particular second network resource identifier; means for training, using the properties of the second list, the probabilistic filter to recognize future electronic mail messages that are associated with threats; means for repeating the retrieving and training for all the second network resource identifiers in the blocklist; wherein the network resource identifiers are uniform resource locators (URLs); wherein generating properties comprises obtaining information from “
whois”
queries, based on a domain name owner for a domain name contained in the particular first network resource identifier or the particular second network resource identifier. - View Dependent Claims (18, 19, 20, 21, 22, 23, 31)
-
-
24. An electronic mail server apparatus, comprising:
-
one or more processors; logic encoded in one or more media for execution and when executed operable to cause the one or more processors to perform; before receiving any electronic message; retrieving a whitelist comprising a plurality of first network resource identifiers that are known not to be associated with threats; retrieving a particular first network resource identifier from the whitelist; generating a first list of properties for the particular first network resource identifier; training, using the properties of the first list, a probabilistic filter to recognize future electronic mail messages that are not associated with threats; repeating the retrieving and training for all the first network resource identifiers in the whitelist; retrieving a blocklist comprising a plurality of second network resource identifiers that are known to be associated with threats; retrieving a particular second network resource identifier from the blocklist; generating a second list of properties for the particular second network resource identifier; training, using the properties of the second list, the probabilistic filter to recognize future electronic mail messages that are associated with threats; repeating the retrieving and training for all the second network resource identifiers in the blocklist; wherein the network resource identifiers are uniform resource locators (URLs); wherein generating properties comprises obtaining information from “
whois”
queries, based on a domain name owner for a domain name contained in the particular first network resource identifier or the particular second network resource identifier. - View Dependent Claims (25, 26, 27, 28, 29, 30)
-
Specification