Detecting unwanted electronic mail messages based on probabilistic analysis of referenced resources

US 7,836,133 B2
Filed: 05/05/2006
Issued: 11/16/2010
Est. Priority Date: 05/05/2005
Status: Active Grant

First Claim

Patent Images

1. A method, comprising:

before receiving any electronic message;

retrieving a whitelist comprising a plurality of first network resource identifiers that are known not to be associated with threats;

retrieving a particular first network resource identifier from the whitelist;

generating a first list of properties for the particular first network resource identifier;

training, using the properties of the first list, a probabilistic filter to recognize future electronic mail messages that are not associated with threats;

repeating the retrieving and training for all the first network resource identifiers in the whitelist;

retrieving a blocklist comprising a plurality of second network resource identifiers that are known to be associated with threats;

retrieving a particular second network resource identifier from the blocklist;

generating a second list of properties for the particular second network resource identifier;

training, using the properties of the second list, the probabilistic filter to recognize future electronic mail messages that are associated with threats;

repeating the retrieving and training for all the second network resource identifiers in the blocklist;

wherein the network resource identifiers are uniform resource locators (URLs);

wherein generating properties comprises obtaining information from “

whois”

queries, based on a domain name owner for a domain name contained in the particular first network resource identifier or the particular second network resource identifier.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In one embodiment, detecting unwanted electronic mail messages based on probabilistic analysis of referenced resources comprises receiving a whitelist and a blocklist each having a plurality of network resource identifiers that have appeared in prior messages; retrieving a particular network resource identifier; generating a list of properties for the particular network resource identifier; training a probabilistic filter using the properties; and repeating the retrieving, generating and training for all the network resource identifiers in the whitelist and blocklist. Thereafter, when an electronic mail message is received and contains a URL or other network resource identifier, a spam score or threat score can be generated for the message by testing properties of the network resource identifier using the trained probabilistic filter.

Citations

31 Claims

1. A method, comprising:
- before receiving any electronic message;
  
  retrieving a whitelist comprising a plurality of first network resource identifiers that are known not to be associated with threats;
  
  retrieving a particular first network resource identifier from the whitelist;
  
  generating a first list of properties for the particular first network resource identifier;
  
  training, using the properties of the first list, a probabilistic filter to recognize future electronic mail messages that are not associated with threats;
  
  repeating the retrieving and training for all the first network resource identifiers in the whitelist;
  
  retrieving a blocklist comprising a plurality of second network resource identifiers that are known to be associated with threats;
  
  retrieving a particular second network resource identifier from the blocklist;
  
  generating a second list of properties for the particular second network resource identifier;
  
  training, using the properties of the second list, the probabilistic filter to recognize future electronic mail messages that are associated with threats;
  
  repeating the retrieving and training for all the second network resource identifiers in the blocklist;
  
  wherein the network resource identifiers are uniform resource locators (URLs);
  
  wherein generating properties comprises obtaining information from “
  
  whois”
  
  queries, based on a domain name owner for a domain name contained in the particular first network resource identifier or the particular second network resource identifier.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method of claim 1, further comprising:
    - receiving a third network resource identifier;
      
      testing the third network resource identifier using the trained probabilistic filter and receiving a probability output indicating a probability that the third network resource identifier is associated with threats;
      
      adding the third network resource identifier to a blacklist when the probability output is greater than a first specified threshold.
  - 3. The method of claim 1, wherein generating the second list of properties comprises:
    - extracting a domain portion of the second network resource identifier;
      
      retrieving from a domain name system one or more mail exchange records associated with the extracted domain portion;
      
      retrieving from the domain name system each address record for each mail server that is identified in the mail exchange records;
      
      retrieving a reputation score value associated with network addresses of each of the address records;
      
      adding the network resource identifier to a blacklist when an average reputation score value is less than a specified threshold.
  - 4. The method of claim 1, wherein generating the second list of properties comprises:
    - extracting a domain portion of the second network resource identifier;
      
      retrieving from a domain name system one or more name server records associated with the extracted domain portion;
      
      retrieving from the domain name system each address record for each mail server that is identified in the name server records;
      
      retrieving a reputation score value associated with network addresses of each of the address records;
      
      adding the network resource identifier to a blacklist when an average reputation score value is less than a specified threshold.
  - 5. The method of claim 3 or claim 4 further comprising sending the blacklist to a plurality of messaging gateway appliances that are coupled to the network.
  - 6. The method of claim 3 or claim 4, further comprising:
    - receiving a copy of the blacklist at a messaging gateway;
      
      at the messaging gateway, receiving an electronic mail message containing a uniform resource locator (URL);
      
      extracting the URL and determining whether the URL is in the copy of the blacklist;
      
      modifying a threat score value associated with the electronic mail message when the URL is in the copy of the blacklist.
  - 7. The method of claim 1, wherein the threats comprise any of viruses, phishing attacks, and pharming attacks.
  - 8. The method of claim 1, wherein the properties comprise any of:
    - information obtained from DNS queries based on the particular first or second network resource identifier including any of names, IP addresses, and servers;
      
      web pages;
      
      server software that the particular first or second network resource identifier is using;
      
      information obtained from “
      
      whois”
      
      queries, based on a network block owner; and
      
      words extracted from the particular first or second network resource identifier.

9. A computer-readable tangible volatile or non-volatile storage medium storing one or more sequences of instructions which, when executed by one or more processors, cause the one or more processors to perform:
- before receiving an electronic message;
  
  retrieving a whitelist comprising a plurality of first network resource identifiers that are known not to be associated with threats;
  
  retrieving a particular first network resource identifier from the whitelist;
  
  generating a first list of properties for the particular first network resource identifier;
  
  training, using the properties of the first list, a probabilistic filter to recognize future electronic mail messages that are not associated with threats;
  
  repeating the retrieving and training for all the first network resource identifiers in the whitelist;
  
  retrieving a blocklist comprising a plurality of second network resource identifiers that are known to be associated with threats;
  
  retrieving a particular second network resource identifier from the blocklist;
  
  generating a second list of properties for the particular second network resource identifier;
  
  training, using the properties of the second list, the probabilistic filter to recognize future electronic mail messages that are associated with threats;
  
  repeating the retrieving and training for all the second network resource identifiers in the blocklist;
  
  wherein the network resource identifiers are uniform resource locators (URLs);
  
  wherein generating properties comprises obtaining information from “
  
  whois”
  
  queries, based on a domain name owner for a domain name contained in the particular first network resource identifier or the particular second network resource identifier.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The computer-readable medium of claim 9, wherein the instructions which when executed by one or more processors cause the one or more processors further to perform:
    - receiving a third network resource identifier;
      
      testing the third network resource identifier using the trained probabilistic filter and receiving a probability output indicating a probability that the third network resource identifier is associated with threats;
      
      adding the third network resource identifier to a blacklist when the probability output is greater than a first specified threshold.
  - 11. The computer-readable medium of claim 9, wherein generating the second list of properties comprises:
    - extracting a domain portion of the second network resource identifier;
      
      retrieving from a domain name system one or more mail exchange records associated with the extracted domain portion;
      
      retrieving from the domain name system each address record for each mail server that is identified in the mail exchange records;
      
      retrieving a reputation score value associated with network addresses of each of the address records;
      
      adding the network resource identifier to a blacklist when an average reputation score value is less than a specified threshold.
  - 12. The computer-readable medium of claim 9, wherein generating the second list of properties comprises:
    - extracting a domain portion of the second network resource identifier;
      
      retrieving from a domain name system one or more name server records associated with the extracted domain portion;
      
      retrieving from the domain name system each address record for each mail server that is identified in the name server records;
      
      retrieving a reputation score value associated with network addresses of each of the address records;
      
      adding the network resource identifier to a blacklist when an average reputation score value is less than a specified threshold.
  - 13. The computer-readable medium of claim 11 or claim 12, wherein the instructions which when executed by one or more processors cause the one or more processors further to perform:
    - sending the blacklist to a plurality of messaging gateway appliances that are coupled to the network.
  - 14. The computer-readable medium of claim 11 or claim 12, wherein the instructions which when executed by one or more processors cause the one or more processors further to perform:
    - receiving a copy of the blacklist at a messaging gateway;
      
      at the messaging gateway, receiving an electronic mail message containing a uniform resource locator (URL);
      
      extracting the URL and determining whether the URL is in the copy of the blacklist;
      
      modifying a threat score value associated with the electronic mail message when the URL is in the copy of the blacklist.
  - 15. The computer-readable medium of claim 9, wherein the threats comprise any of viruses, phishing attacks, and pharming attacks.
  - 16. The computer-readable medium of claim 9, wherein the properties comprise any of:
    - information obtained from DNS queries based on the particular first or second network resource identifier including any of names, IP addresses, and servers;
      
      web pages;
      
      server software that the particular first or second network resource identifier is using;
      
      information obtained from “
      
      whois”
      
      queries, based on a network block owner; and
      
      words extracted from the particular first or second network resource identifier.

17. An apparatus, comprising:
- one or more processors;
  
  means for retrieving, before receiving any electronic message, a whitelist comprising a plurality of first network resource identifiers that are known not to be associated with threats;
  
  means for retrieving a particular first network resource identifier from the whitelist;
  
  means for generating a first list of properties for the particular first network resource identifier;
  
  means for training, using the properties of the first list, a probabilistic filter to recognize future electronic mail messages that are not associated with threats;
  
  means for repeating execution of the retrieving and training means for all the first network resource identifiers in the whitelist;
  
  means for retrieving a blocklist comprising a plurality of second network resource identifiers that are known to be associated with threats;
  
  means for retrieving a particular second network resource identifier from the blocklist;
  
  means for generating a second list of properties for the particular second network resource identifier;
  
  means for training, using the properties of the second list, the probabilistic filter to recognize future electronic mail messages that are associated with threats;
  
  means for repeating the retrieving and training for all the second network resource identifiers in the blocklist;
  
  wherein the network resource identifiers are uniform resource locators (URLs);
  
  wherein generating properties comprises obtaining information from “
  
  whois”
  
  queries, based on a domain name owner for a domain name contained in the particular first network resource identifier or the particular second network resource identifier.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 31)
- - 18. The apparatus of claim 17, further comprising:
    - means for receiving a third network resource identifier;
      
      means for testing the third network resource identifier using the trained probabilistic filter and for receiving a probability output indicating a probability that the third network resource identifier is associated with threats;
      
      means for adding the third network resource identifier to a blacklist when the probability output is greater than a first specified threshold.
  - 19. The apparatus of claim 17, wherein means for generating the second list of properties comprises:
    - means for extracting a domain portion of the second network resource identifier;
      
      means for retrieving from a domain name system one or more mail exchange records associated with the extracted domain portion;
      
      means for retrieving from the domain name system each address record for each mail server that is identified in the mail exchange records;
      
      means for retrieving a reputation score value associated with network addresses of each of the address records;
      
      means for adding the network resource identifier to a blacklist when an average reputation score value is less than a specified threshold.
  - 20. The apparatus of claim 17, wherein generating the second list of properties comprises:
    - extracting a domain portion of the second network resource identifier;
      
      retrieving from a domain name system one or more name server records associated with the extracted domain portion;
      
      retrieving from the domain name system each address record for each mail server that is identified in the name server records;
      
      retrieving a reputation score value associated with network addresses of each of the address records;
      
      adding the network resource identifier to a blacklist when an average reputation score value is less than a specified threshold.
  - 21. The apparatus of claim 19 or claim 20 further comprising means for sending the blacklist to a plurality of messaging gateway appliances that are coupled to the network.
  - 22. The apparatus of claim 19 or claim 20, further comprising:
    - means for receiving a copy of the blacklist at a messaging gateway;
      
      at the messaging gateway, means for receiving an electronic mail message containing a uniform resource locator (URL);
      
      means for extracting the URL and determining whether the URL is in the copy of the blacklist;
      
      means for modifying a threat score value associated with the electronic mail message when the URL is in the copy of the blacklist.
  - 23. The apparatus of claim 17, wherein the threats comprise any of viruses, phishing attacks, and pharming attacks.
  - 31. The apparatus of claim 17 or claim 24, wherein the properties comprise any of:
    - information obtained from DNS queries based on the particular first or second network resource identifier including any of names, IP addresses, and servers;
      
      web pages;
      
      server software that the particular first or second network resource identifier is using;
      
      information obtained from “
      
      whois”
      
      queries, based on a network block owner; and
      
      words extracted from the particular first or second network resource identifier.

24. An electronic mail server apparatus, comprising:
- one or more processors;
  
  logic encoded in one or more media for execution and when executed operable to cause the one or more processors to perform;
  
  before receiving any electronic message;
  
  retrieving a whitelist comprising a plurality of first network resource identifiers that are known not to be associated with threats;
  
  retrieving a particular first network resource identifier from the whitelist;
  
  generating a first list of properties for the particular first network resource identifier;
  
  training, using the properties of the first list, a probabilistic filter to recognize future electronic mail messages that are not associated with threats;
  
  repeating the retrieving and training for all the first network resource identifiers in the whitelist;
  
  retrieving a blocklist comprising a plurality of second network resource identifiers that are known to be associated with threats;
  
  retrieving a particular second network resource identifier from the blocklist;
  
  generating a second list of properties for the particular second network resource identifier;
  
  training, using the properties of the second list, the probabilistic filter to recognize future electronic mail messages that are associated with threats;
  
  repeating the retrieving and training for all the second network resource identifiers in the blocklist;
  
  wherein the network resource identifiers are uniform resource locators (URLs);
  
  wherein generating properties comprises obtaining information from “
  
  whois”
  
  queries, based on a domain name owner for a domain name contained in the particular first network resource identifier or the particular second network resource identifier.
- View Dependent Claims (25, 26, 27, 28, 29, 30)
- - 25. The apparatus of claim 24, wherein the logic when executed is further operable to perform:
    - receiving a third network resource identifier;
      
      testing the third network resource identifier using the trained probabilistic filter and receiving a probability output indicating a probability that the third network resource identifier is associated with threats;
      
      adding the third network resource identifier to a blacklist when the probability output is greater than a first specified threshold.
  - 26. The apparatus of claim 24, wherein the logic for generating the second list of properties comprises further logic that when executed is operable to perform:
    - extracting a domain portion of the second network resource identifier;
      
      retrieving from a domain name system one or more mail exchange records associated with the extracted domain portion;
      
      retrieving from the domain name system each address record for each mail server that is identified in the mail exchange records;
      
      retrieving a reputation score value associated with network addresses of each of the address records;
      
      adding the network resource identifier to a blacklist when an average reputation score value is less than a specified threshold.
  - 27. The apparatus of claim 24, wherein the logic for generating the second list of properties comprises further logic that when executed is operable to perform:
    - extracting a domain portion of the second network resource identifier;
      
      retrieving from a domain name system one or more name server records associated with the extracted domain portion;
      
      retrieving from the domain name system each address record for each mail server that is identified in the name server records;
      
      retrieving a reputation score value associated with network addresses of each of the address records;
      
      adding the network resource identifier to a blacklist when an average reputation score value is less than a specified threshold.
  - 28. The apparatus of claim 26 or claim 27, wherein the logic when executed is further operable to perform:
    - sending the blacklist to a plurality of messaging gateway appliances that are coupled to the network.
  - 29. The apparatus of claim 26 or claim 27, wherein the logic when executed is further operable to perform:
    - receiving a copy of the blacklist at a messaging gateway;
      
      at the messaging gateway, receiving an electronic mail message containing a uniform resource locator (URL);
      
      extracting the URL and determining whether the URL is in the copy of the blacklist;
      
      modifying a threat score value associated with the electronic mail message when the URL is in the copy of the blacklist.
  - 30. The apparatus of claim 24, wherein the threats comprise any of viruses, phishing attacks, and pharming attacks.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Ironport Systems (Cisco Systems, Inc.)
Original Assignee
Ironport Systems (Cisco Systems, Inc.)
Inventors
Kehl, Jason, Wescott, Jeffrey, Quinlan, Daniel
Primary Examiner(s)
BATES, KEVIN T

Application Number

US11/418,823
Publication Number

US 20070078936A1
Time in Patent Office

1,656 Days
Field of Search

709/206, 709/207, 726/22, 726/25
US Class Current

709/206
CPC Class Codes

G06Q 10/107   Computer-aided management o...

H04L 51/212   using filtering or selectiv...

H04L 51/234   for tracking messages

H04L 61/4511   using domain name system [DNS]

H04L 63/123   received data contents, e.g...

H04L 63/126   the source of the received ...

H04L 63/145   the attack involving the pr...

Detecting unwanted electronic mail messages based on probabilistic analysis of referenced resources

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Detecting unwanted electronic mail messages based on probabilistic analysis of referenced resources

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links