Detecting unwanted electronic mail messages based on probabilistic analysis of referenced resources

US 20070078936A1
Filed: 05/05/2006
Published: 04/05/2007
Est. Priority Date: 05/05/2005
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

retrieving a whitelist comprising a plurality of first network resource identifiers that have been included in past electronic mail messages;

retrieving a particular first network resource identifier from the whitelist;

generating a first list of properties for the particular first network resource identifier;

training, using the properties, a probabilistic filter;

repeating the extracting, retrieving and training for all the first network resource identifiers in the whitelist;

retrieving a blocklist comprising a plurality of second network resource identifiers that have been included in past electronic mail messages associated with spam or threats;

retrieving a particular second network resource identifier from the blacklist;

generating a second list of properties for the particular second network resource identifier;

training, using the properties, the probabilistic filter;

repeating the extracting, retrieving and training for all the second network resource identifiers in the blacklist.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In one embodiment, detecting unwanted electronic mail messages based on probabilistic analysis of referenced resources comprises receiving a whitelist and a blocklist each having a plurality of network resource identifiers that have appeared in prior messages; retrieving a particular network resource identifier; generating a list of properties for the particular network resource identifier; training a probabilistic filter using the properties; and repeating the retrieving, generating and training for all the network resource identifiers in the whitelist and blocklist. Thereafter, when an electronic mail message is received and contains a URL or other network resource identifier, a spam score or threat score can be generated for the message by testing properties of the network resource identifier using the trained probabilistic filter.

156 Citations

39 Claims

1. A method, comprising:
- retrieving a whitelist comprising a plurality of first network resource identifiers that have been included in past electronic mail messages;
  
  retrieving a particular first network resource identifier from the whitelist;
  
  generating a first list of properties for the particular first network resource identifier;
  
  training, using the properties, a probabilistic filter;
  
  repeating the extracting, retrieving and training for all the first network resource identifiers in the whitelist;
  
  retrieving a blocklist comprising a plurality of second network resource identifiers that have been included in past electronic mail messages associated with spam or threats;
  
  retrieving a particular second network resource identifier from the blacklist;
  
  generating a second list of properties for the particular second network resource identifier;
  
  training, using the properties, the probabilistic filter;
  
  repeating the extracting, retrieving and training for all the second network resource identifiers in the blacklist.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 37)
- - 2. The method of claim 1, further comprising:
    - receiving a third network resource identifier;
      
      testing the third network resource identifier using the trained probabilistic filter and receiving a probability output indicating a probability that the third network resource identifier is associated with or threats;
      
      adding the third network resource identifier to a blacklist when the probability output is greater than a first specified threshold.
  - 3. The method of claim 1, wherein generating the second list of properties comprises:
    - extracting a domain portion of the second network resource identifier;
      
      retrieving from a domain name system one or more mail exchange records associated with the extracted domain portion;
      
      retrieving from the domain name system each address record for each mail server that is identified in the mail exchange records;
      
      retrieving a reputation score value associated with network addresses of each of the address records;
      
      adding the network resource identifier to the blacklist when an average reputation score value is less than a specified threshold.
  - 4. The method of claim 1 wherein the network resource identifiers are uniform resource locators (URLs).
  - 5. The method of claim 1, wherein generating the second list of properties comprises:
    - extracting a domain portion of the second network resource identifier;
      
      retrieving from a domain name system one or more name server records associated with the extracted domain portion;
      
      retrieving from the domain name system each address record for each mail server that is identified in the name server records;
      
      retrieving a reputation score value associated with network addresses of each of the address records;
      
      adding the network resource identifier to the blacklist when an average reputation score value is less than a specified threshold.
  - 6. The method of claim 3 or claim 5 further comprising sending the blacklist to a plurality of messaging gateway appliances that are coupled to the network.
  - 7. The method of claim 3 or claim 5, wherein the blacklist is separate from the blocklist recited in claim 1.
  - 8. The method of claim 3 or claim 5, further comprising:
    - receiving a copy of the blacklist at a messaging gateway;
      
      at the messaging gateway, receiving an electronic mail message containing a uniform resource locator (URL);
      
      extracting the URL and determining whether the URL is in the copy of the blacklist;
      
      modifying a threat score value associated with the electronic mail message when the URL is in the copy of the blacklist.
  - 9. The method of claim 1, wherein the threats comprise any of viruses, phishing attacks, and pharming attacks.
  - 37. The method of claim 1, wherein the properties comprise any of:
    - information obtained from DNS queries based on the particular first or second network resource identifier including any of names, IP addresses, and servers;
      
      web pages;
      
      server software that the particular first or second network resource identifier is using;
      
      information obtained from “
      
      whois”
      
      queries, based on both a domain name owner for a domain name contained in the particular first or second network resource identifier and a network block owner; and
      
      words extracted from the particular first or second network resource identifier.

10. A computer-readable tangible storage medium carrying one or more sequences of instructions which, when executed by one or more processors, cause the one or more processors to perform:
- retrieving a whitelist comprising a plurality of first network resource identifiers that have been included in past electronic mail messages;
  
  retrieving a particular first network resource identifier from the whitelist;
  
  generating a first list of properties for the particular first network resource identifier;
  
  training, using the properties, a probabilistic filter;
  
  repeating the extracting, retrieving and training for all the first network resource identifiers in the whitelist;
  
  retrieving a blacklist comprising a plurality of second network resource identifiers that have been included in past electronic mail messages associated with spam or threats;
  
  retrieving a particular second network resource identifier from the blocklist;
  
  generating a second list of properties for the particular second network resource identifier;
  
  training, using the properties, the probabilistic filter;
  
  repeating the extracting, retrieving and training for all the second network resource identifiers in the blocklist.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18, 39)
- - 11. The computer-readable medium of claim 10, further comprising:
    - receiving a third network resource identifier;
      
      testing the third network resource identifier using the trained probabilistic filter and receiving a probability output indicating a probability that the third network resource identifier is associated with or threats;
      
      adding the third network resource identifier to a blacklist when the probability output is greater than a first specified threshold.
  - 12. The computer-readable medium of claim 10, wherein generating the second list of properties comprises:
    - extracting a domain portion of the second network resource identifier;
      
      retrieving from a domain name system one or more mail exchange records associated with the extracted domain portion;
      
      retrieving from the domain name system each address record for each mail server that is identified in the mail exchange records;
      
      retrieving a reputation score value associated with network addresses of each of the address records;
      
      adding the network resource identifier to the blacklist when an average reputation score value is less than a specified threshold.
  - 13. The computer-readable medium of claim 10 wherein the network resource identifiers are uniform resource locators (URLs).
  - 14. The computer-readable medium of claim 10, wherein generating the second list of properties comprises:
    - extracting a domain portion of the second network resource identifier;
      
      retrieving from a domain name system one or more name server records associated with the extracted domain portion;
      
      retrieving from the domain name system each address record for each mail server that is identified in the name server records;
      
      retrieving a reputation score value associated with network addresses of each of the address records;
      
      adding the network resource identifier to the blacklist when an average reputation score value is less than a specified threshold.
  - 15. The computer-readable medium of claim 12 or claim 14 further comprising sending the blacklist to a plurality of messaging gateway appliances that are coupled to the network.
  - 16. The computer-readable medium of claim 12 or claim 14, wherein the blacklist is separate from the blocklist recited in claim 10.
  - 17. The computer-readable medium of claim 12 or claim 14, further comprising:
    - receiving a copy of the blacklist at a messaging gateway;
      
      at the messaging gateway, receiving an electronic mail message containing a uniform resource locator (URL);
      
      extracting the URL and determining whether the URL is in the copy of the blacklist;
      
      modifying a threat score value associated with the electronic mail message when the URL is in the copy of the blacklist.
  - 18. The computer-readable medium of claim 10, wherein the threats comprise any of viruses, phishing attacks, and pharming attacks.
  - 39. The computer-readable medium of claim 10, wherein the properties comprise any of:
    - information obtained from DNS queries based on the particular first or second network resource identifier including any of names, IP addresses, and servers;
      
      web pages;
      
      server software that the particular first or second network resource identifier is using;
      
      information obtained from “
      
      whois”
      
      queries, based on both a domain name owner for a domain name contained in the particular first or second network resource identifier and a network block owner; and
      
      words extracted from the particular first or second network resource identifier.

19. An apparatus, comprising:
- means for retrieving a whitelist comprising a plurality of first network resource identifiers that have been included in past electronic mail messages;
  
  means for retrieving a particular first network resource identifier from the whitelist;
  
  means for generating a first list of properties for the particular first network resource identifier;
  
  means for training, using the properties, a probabilistic filter;
  
  means for repeating execution of the extracting, retrieving and training means for all the first network resource identifiers in the whitelist;
  
  means for retrieving a blocklist comprising a plurality of second network resource identifiers that have been included in past electronic mail messages associated with spam or threats;
  
  means for retrieving a particular second network resource identifier from the blocklist;
  
  means for generating a second list of properties for the particular second network resource identifier;
  
  means for training, using the properties, the probabilistic filter;
  
  means for repeating the extracting, retrieving and training for all the second network resource identifiers in the blacklist.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27, 38)
- - 20. The apparatus of claim 19, further comprising:
    - means for receiving a third network resource identifier;
      
      means for testing the third network resource identifier using the trained probabilistic filter and for receiving a probability output indicating a probability that the third network resource identifier is associated with or threats;
      
      means for adding the third network resource identifier to a blacklist when the probability output is greater than a first specified threshold.
  - 21. The apparatus of claim 19, wherein generating the second list of properties comprises:
    - means for extracting a domain portion of the second network resource identifier;
      
      means for retrieving from a domain name system one or more mail exchange records associated with the extracted domain portion;
      
      means for retrieving from the domain name system each address record for each mail server that is identified in the mail exchange records;
      
      means for retrieving a reputation score value associated with network addresses of each of the address records;
      
      means for adding the network resource identifier to the blacklist when an average reputation score value is less than a specified threshold.
  - 22. The apparatus of claim 19 wherein the network resource identifiers are uniform resource locators (URLs).
  - 23. The apparatus of claim 19, wherein generating the second list of properties comprises:
    - extracting a domain portion of the second network resource identifier;
      
      retrieving from a domain name system one or more name server records associated with the extracted domain portion;
      
      retrieving from the domain name system each address record for each mail server that is identified in the name server records;
      
      retrieving a reputation score value associated with network addresses of each of the address records;
      
      adding the network resource identifier to the blacklist when an average reputation score value is less than a specified threshold.
  - 24. The apparatus of claim 21 or claim 23 further comprising means for sending the blacklist to a plurality of messaging gateway appliances that are coupled to the network.
  - 25. The apparatus of claim 21 or claim 23, wherein the blacklist is separate from the blacklist recited in claim 19.
  - 26. The apparatus of claim 21 or claim 23, further comprising:
    - means for receiving a copy of the blacklist at a messaging gateway;
      
      at the messaging gateway, means for receiving an electronic mail message containing a uniform resource locator (URL);
      
      means for extracting the URL and determining whether the URL is in the copy of the blacklist;
      
      means for modifying a threat score value associated with the electronic mail message when the URL is in the copy of the blacklist.
  - 27. The apparatus of claim 19, wherein the threats comprise any of viruses, phishing attacks, and pharming attacks.
  - 38. The apparatus of claim 19 or claim 28, wherein the properties comprise any of:
    - information obtained from DNS queries based on the particular first or second network resource identifier including any of names, IP addresses, and servers;
      
      web pages;
      
      server software that the particular first or second network resource identifier is using;
      
      information obtained from “
      
      whois”
      
      queries, based on both a domain name owner for a domain name contained in the particular first or second network resource identifier and a network block owner; and
      
      words extracted from the particular first or second network resource identifier.

28. An electronic mail server, comprising:
- one or more processors;
  
  logic encoded in one or more media for execution and when executed operable to cause the one or more processors to perform;
  
  retrieving a whitelist comprising a plurality of first network resource identifiers that have been included in past electronic mail messages;
  
  retrieving a particular first network resource identifier from the whitelist;
  
  generating a first list of properties for the particular first network resource identifier;
  
  training, using the properties, a probabilistic filter;
  
  repeating the extracting, retrieving and training for all the first network resource identifiers in the whitelist;
  
  retrieving a blocklist comprising a plurality of second network resource identifiers that have been included in past electronic mail messages associated with spam or threats;
  
  retrieving a particular second network resource identifier from the blocklist;
  
  generating a second list of properties for the particular second network resource identifier;
  
  training, using the properties, the probabilistic filter;
  
  repeating the extracting, retrieving and training for all the second network resource identifiers in the blacklist.
- View Dependent Claims (29, 30, 31, 32, 33, 34, 35, 36)
- - 29. The apparatus of claim 28, wherein the logic when executed is further operable to perform:
    - receiving a third network resource identifier;
      
      testing the third network resource identifier using the trained probabilistic filter and receiving a probability output indicating a probability that the third network resource identifier is associated with or threats;
      
      adding the third network resource identifier to a blacklist when the probability output is greater than a first specified threshold.
  - 30. The apparatus of claim 28, wherein the logic for generating the second list of properties comprises further logic that when executed is operable to perform:
    - extracting a domain portion of the second network resource identifier;
      
      retrieving from a domain name system one or more mail exchange records associated with the extracted domain portion;
      
      retrieving from the domain name system each address record for each mail server that is identified in the mail exchange records;
      
      retrieving a reputation score value associated with network addresses of each of the address records;
      
      adding the network resource identifier to the blacklist when an average reputation score value is less than a specified threshold.
  - 31. The apparatus of claim 28 wherein the network resource identifiers are uniform resource locators (URLs).
  - 32. The apparatus of claim 28, wherein the logic for generating the second list of properties comprises further logic that when executed is operable to perform:
    - extracting a domain portion of the second network resource identifier;
      
      retrieving from a domain name system one or more name server records associated with the extracted domain portion;
      
      retrieving from the domain name system each address record for each mail server that is identified in the name server records;
      
      retrieving a reputation score value associated with network addresses of each of the address records;
      
      adding the network resource identifier to the blacklist when an average reputation score value is less than a specified threshold.
  - 33. The apparatus of claim 30 or claim 32 further comprising sending the blacklist to a plurality of messaging gateway appliances that are coupled to the network.
  - 34. The apparatus of claim 30 or claim 32, wherein the blacklist is separate from the blocklist recited in claim 37.
  - 35. The apparatus of claim 30 or claim 32, further comprising:
    - receiving a copy of the blacklist at a messaging gateway;
      
      at the messaging gateway, receiving an electronic mail message containing a uniform resource locator (URL);
      
      extracting the URL and determining whether the URL is in the copy of the blacklist;
      
      modifying a threat score value associated with the electronic mail message when the URL is in the copy of the blacklist.
  - 36. The apparatus of claim 28, wherein the threats comprise any of viruses, phishing attacks, and pharming attacks.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Ironport Systems (Cisco Systems, Inc.)
Original Assignee
Ironport Systems (Cisco Systems, Inc.)
Inventors
Quinlan, Daniel, Kehl, Jason, Wescott, Jeffrey

Granted Patent

US 7,836,133 B2
Time in Patent Office

Days
Field of Search
US Class Current

709/206
CPC Class Codes

G06Q 10/107   Computer-aided management o...

H04L 51/212   using filtering or selectiv...

H04L 51/234   for tracking messages

H04L 61/4511   using domain name system [DNS]

H04L 63/123   received data contents, e.g...

H04L 63/126   the source of the received ...

H04L 63/145   the attack involving the pr...

Detecting unwanted electronic mail messages based on probabilistic analysis of referenced resources

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

156 Citations

39 Claims

Specification

Solutions

Use Cases

Quick Links

Detecting unwanted electronic mail messages based on probabilistic analysis of referenced resources

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

156 Citations

39 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links