Intelligent SPAM detection system using statistical analysis

US 7,016,939 B1
Filed: 07/26/2001
Issued: 03/21/2006
Est. Priority Date: 07/26/2001
Status: Expired due to Term

First Claim

Patent Images

1. A method for detecting an unwanted message, comprising:

(a) receiving an electronic mail message;

(b) decomposing text in the electronic mail message;

(c) gathering statistics associated with the text using a statistical analyzer; and

(d) analyzing the statistics for determining whether the electronic mail message is an unwanted message;

wherein the statistics gathered using the statistical analyzer include results of an analysis of a uniform resource locator (URL) in the electronic mail message text;

wherein the statistics gathered using the statistical analyzer include results of an analysis of e-mail addresses in the electronic mail message text;

wherein the statistics gathered using the statistical analyzer include results of a message header field analysis;

wherein the statistics are sent to a neural network engine, wherein the neural network engine compares the statistics to predetermined weights for determining whether the electronic mail message is an unwanted message;

wherein the neural network engine is taught to recognize unwanted messages;

wherein examples are provided to the neural network engine, wherein the examples are of wanted messages and unwanted messages, and each of the examples is associated with a desired output;

wherein each of the examples are processed by the neural network engine for generating the weights, wherein each of the weights is used to denote wanted and unwanted messages;

wherein the neural network engine utilizes an adaptive linear combination for adjusting the weights.

View all claims

11 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A system, method and computer program product are provided for detecting an unwanted message. First, an electronic mail message is received. Text in the electronic mail message is decomposed. Statistics associated with the text are gathered using a statistical analyzer. The statistics are analyzed for determining whether the electronic mail message is an unwanted message.

Citations

31 Claims

1. A method for detecting an unwanted message, comprising:
- (a) receiving an electronic mail message;
  
  (b) decomposing text in the electronic mail message;
  
  (c) gathering statistics associated with the text using a statistical analyzer; and
  
  (d) analyzing the statistics for determining whether the electronic mail message is an unwanted message;
  
  wherein the statistics gathered using the statistical analyzer include results of an analysis of a uniform resource locator (URL) in the electronic mail message text;
  
  wherein the statistics gathered using the statistical analyzer include results of an analysis of e-mail addresses in the electronic mail message text;
  
  wherein the statistics gathered using the statistical analyzer include results of a message header field analysis;
  
  wherein the statistics are sent to a neural network engine, wherein the neural network engine compares the statistics to predetermined weights for determining whether the electronic mail message is an unwanted message;
  
  wherein the neural network engine is taught to recognize unwanted messages;
  
  wherein examples are provided to the neural network engine, wherein the examples are of wanted messages and unwanted messages, and each of the examples is associated with a desired output;
  
  wherein each of the examples are processed by the neural network engine for generating the weights, wherein each of the weights is used to denote wanted and unwanted messages;
  
  wherein the neural network engine utilizes an adaptive linear combination for adjusting the weights.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
- - 2. The method as recited in claim 1, wherein the statistics gathered using the statistical analyzer include a ratio of words capitalized to total number of words.
  - 3. The method as recited in claim 1, wherein the statistics gathered using the statistical analyzer include a punctuation to word ratio.
  - 4. The method as recited in claim 1, wherein the statistics gathered using the statistical analyzer include a number of uniform resource locators (URLs) in the text.
  - 5. The method as recited in claim 1, wherein the statistics gathered using the statistical analyzer include at least one telephone number in the text.
  - 6. The method as recited in claim 1, wherein the statistics gathered using the statistical analyzer include results of an analysis of character type.
  - 7. The method as recited in claim 1, wherein the statistics gathered using the statistical analyzer include a ratio of words capitalized to total number of words, a punctuation to word ratio, a number of URLs in the text, a number of telephone numbers in the text, addresses in the text, and results of a message header field analysis.
  - 8. The method as recited in claim 1, wherein the statistics are placed in a results table, wherein entries in the table are passed as inputs to the neural network engine.
  - 9. The method as recited in claim 1, wherein logic associated with the neural network engine is updated based on the processing by the neural network engine.
  - 10. The method as recited in claim 9, wherein the neural network engine is updated to recognize an unwanted message, the message is identified as an unwanted message, the features of the message that make the message unwanted are identified, and the identified features are stored and used by the neural network to identify subsequent unwanted messages.
  - 11. The method as recited in claim 1, wherein the neural network engine analyzes previous user input for determining whether the message is unwanted.
  - 12. The method as recited in claim 1, wherein the adaptive linear combination is presented input vectors and desired responses for the adjusting weights until outputs are close to the desired responses.

13. A computer program product having computer-executable codes embodied in a computer-readable medium for detecting an unwanted message, comprising:
- (a) computer code for receiving an electronic mail message;
  
  (b) computer code for decomposing text in the electronic mail message;
  
  (c) computer code for gathering statistics associated with the text using a statistical analyzer; and
  
  (d) computer code for analyzing the statistics for determining whether the electronic mail message is an unwanted message;
  
  wherein the statistics gathered using the statistical analyzer include results of an analysis of a uniform resource locator (URL) in the electronic mail message text;
  
  wherein the statistics gathered using the statistical analyzer include results of an analysis of e-mail addresses in the electronic mail message text;
  
  wherein the statistics gathered using the statistical analyzer include results of a message header field analysis;
  
  wherein the statistics gathered using the statistical analyzer include results of an analysis of a uniform resource locator (URL) in the electronic mail message text;
  
  wherein the statistics gathered using the statistical analyzer include results of an analysis of e-mail addresses in the electronic mail message text;
  
  wherein the statistics gathered using the statistical analyzer include results of a message header field analysis;
  
  wherein the statistics are sent to a neural network engine, wherein the neural network engine compares the statistics to predetermined weights for determining whether the electronic mail message is an unwanted message;
  
  wherein the neural network engine is taught to recognize unwanted messages;
  
  wherein examples are provided to the neural network engine, wherein the examples are of wanted messages and unwanted messages, and each of the examples is associated with a desired output;
  
  wherein each of the examples are processed by the neural network engine for generating the weights, wherein each of the weights is used to denote wanted and unwanted messages;
  
  wherein the neural network engine utilizes an adaptive linear combination for adjusting the weights.
- View Dependent Claims (14, 15, 16, 17, 18, 19, 20, 21)
- - 14. The computer program product as recited in claim 13, wherein the statistics gathered using the statistical analyzer include a ratio of words capitalized to total number of words.
  - 15. The computer program product as recited in claim 13, wherein the statistics gathered using the statistical analyzer include a punctuation to word ratio.
  - 16. The computer program product as recited in claim 13, wherein the statistics gathered using the statistical analyzer include a number of uniform resource locators (URLs) in the text.
  - 17. The computer program product as recited in claim 13, wherein the statistics gathered using the statistical analyzer include at least one telephone number in the text.
  - 18. The computer program product as recited in claim 13, wherein the statistics gathered using the statistical analyzer include results of an analysis of character type.
  - 19. The computer program product as recited in claim 13, wherein logic associated with the neural network engine is updated based on the processing by the neural network engine.
  - 20. The computer program product as recited in claim 19, wherein the neural network engine is updated to recognize an unwanted message, the message is identified as an unwanted message, the features of the message that make the message unwanted are identified, and the identified features are stored and used by the neural network to identify subsequent unwanted messages.
  - 21. The computer program product as recited in claim 13, wherein the neural network engine analyzes previous user input for determining whether the message is unwanted.

22. A system for detecting an unwanted message, comprising:
- (a) a statistical analyzer for gathering statistics associated with text retrieved from an electronic mail message; and
  
  (b) a neural network engine coupled to the statistical analyzer for analyzing the statistics;
  
  (c) wherein the neural network engine determines whether the electronic mail message is an unwanted message;
  
  wherein the statistics gathered using the statistical analyzer include results of an analysis of a uniform resource locator (URL) in the electronic mail message text;
  
  wherein the statistics gathered using the statistical analyzer include results of an analysis of e-mail addresses in the electronic mail message text;
  
  wherein the statistics gathered using the statistical analyzer include results of a message header field analysis;
  
  wherein the statistics gathered using the statistical analyzer include results of an analysis of a uniform resource locator (URL) in the electronic mail message text;
  
  wherein the statistics gathered using the statistical analyzer include results of an analysis of e-mail addresses in the electronic mail message text;
  
  wherein the statistics gathered using the statistical analyzer include results of a message header field analysis;
  
  wherein the statistics are sent to the neural network engine, wherein the neural network engine compares the statistics to predetermined weights for determining whether the electronic mail message is an unwanted message;
  
  wherein the neural network engine is taught to recognize unwanted messages;
  
  wherein examples are provided to the neural network engine, wherein the examples are of wanted messages and unwanted messages, and each of the examples is associated with a desired output;
  
  wherein each of the examples are processed by the neural network engine for generating the weights, wherein each of the weights is used to denote wanted and unwanted messages;
  
  wherein the neural network engine utilizes an adaptive linear combination for adjusting the weights.
- View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30)
- - 23. The system as recited in claim 22, wherein the statistics gathered using the statistical analyzer include a ratio of words capitalized to total number of words.
  - 24. The system as recited in claim 22, wherein the statistics gathered using the statistical analyzer include a punctuation to word ratio.
  - 25. The system as recited in claim 22, wherein the statistics gathered using the statistical analyzer include a number of uniform resource locators (URLs) in the text.
  - 26. The system as recited in claim 22, wherein the statistics gathered using the statistical analyzer include at least one telephone number in the text.
  - 27. The system as recited in claim 22, wherein the statistics gathered using the statistical analyzer include results of an analysis of character type.
  - 28. The system as recited in claim 22, wherein logic associated with the neural network engine is updated based on the processing by the neural network engine.
  - 29. The system as recited in claim 28, wherein the neural network engine is updated to recognize an unwanted message, the message is identified as an unwanted message, the features of the message that make the message unwanted are identified, and the identified features are stored and used by the neural network to identify subsequent unwanted messages.
  - 30. The system as recited in claim 22, wherein the neural network engine analyzes previous user input for determining whether the message is unwanted.

31. A method for detecting an unwanted message, comprising:
- (a) receiving an electronic mail message;
  
  (b) decomposing text in the electronic mail message;
  
  (c) gathering statistics associated with the text using a statistical analyzer, wherein the statistics gathered using the statistical analyzer include at least three of a ratio of words capitalized to total number of words, a punctuation to word ratio, a number of URLs in the text, a telephone number in the text, results of an analysis of a uniform resource locator (URL) in the electronic mail message text, results of an analysis of e-mail addresses in the electronic mail message text, results of an analysis of character type, and results of a message header field analysis; and
  
  (d) analyzing the statistics for determining whether the electronic mail message is an unwanted message;
  
  wherein the statistics gathered using the statistical analyzer include results of an analysis of a uniform resource locator (URL) in the electronic mail message text;
  
  wherein the statistics gathered using the statistical analyzer include results of an analysis of e-mail addresses in the electronic mail message text;
  
  wherein the statistics gathered using the statistical analyzer include results of a message header field analysis;
  
  wherein the statistics are sent to a neural network engine, wherein the neural network engine compares the statistics to predetermined weights for determining whether the electronic mail message is an unwanted message;
  
  wherein the neural network engine is taught to recognize unwanted messages;
  
  wherein examples are provided to the neural network engine, wherein the examples are of wanted messages and unwanted messages, and each of the examples is associated with a desired output;
  
  wherein each of the examples are processed by the neural network engine for generating the weights, wherein each of the weights is used to denote wanted and unwanted messages;
  
  wherein the neural network engine utilizes an adaptive linear combination for adjusting the weights.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
McAfee, LLC
Original Assignee
McAfee, Inc. (McAfee, LLC)
Inventors
Jagger, Luke D., Rothwell, Anton C., Dennis, William R., Clarke, David R.
Primary Examiner(s)
BAROT, BHARAT

Application Number

US09/916,599
Time in Patent Office

1,699 Days
Field of Search

709/206, 709/207
US Class Current

709/206
CPC Class Codes

H04L 51/212 using filtering or selectiv...

Intelligent SPAM detection system using statistical analysis

First Claim

11 Assignments

0 Petitions

Accused Products

Abstract

Citations

31 Claims

Specification

Solutions

Use Cases

Quick Links

Intelligent SPAM detection system using statistical analysis

First Claim

11 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

31 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links