Online adaptive filtering of messages

US 8,214,437 B1
Filed: 12/23/2003
Issued: 07/03/2012
Est. Priority Date: 07/21/2003
Status: Active Grant

First Claim

Patent Images

1. A method of handling messages in a messaging system that includes a message gateway and individual message boxes for users of the system, wherein a message addressed to a user is delivered to the user'"'"'s message box after passing through the message gateway, the method comprising:

knowingly biasing a global, scoring e-mail classifier relative to a personal, scoring e-mail classifier such that the global, scoring e-mail classifier is less stringent than the personal, scoring e-mail classifier as to what is classified as spam, wherein the global, scoring e-mail classifier and the personal, scoring e-mail classifier are probabilistic e-mail classifiers such that, to classify a message, the global, scoring e-mail classifier and the personal, scoring e-mail classifier use respective internal models to determine a probability measure for the message and compare the probability measure to a classification threshold;

receiving messages at the message gateway;

inputting the received messages into the global, scoring e-mail classifier to classify the input messages as spam or non-spam;

handling at least one of the messages input into the global, scoring e-mail classifier based on whether the global, scoring e-mail classifier classified the at least one message as spam or non-spam;

outputting at least one message from the global, scoring e-mail classifier, wherein the outputted message has been classified as non-spam by the global, scoring e-mail classifier;

inputting the outputted message from the global, scoring e-mail classifier into the personal, scoring e-mail classifier to classify the at least one outputted message as spam or non-spam;

handling the at least one outputted message input into the personal, scoring e-mail classifier based on whether the personal, scoring e-mail classifier classified the at least one outputted message as spam or non-spam;

receiving an indication from a user to change the classification of the at least one outputted message;

in response to the indication, changing the classification of the at least one outputted message;

generating retraining data based on the change to the classification of the at least one outputted message; and

retraining the personal, scoring e-mail classifier based on the generated retraining data such that the personal, scoring e-mail classifier'"'"'s internal model is refined to track the user'"'"'s subjective perceptions as to what messages constitute spam messages.

View all claims

12 Assignments

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

In general, a two or more stage spam filtering system is used to filter spam in an e-mail system. One stage includes a global e-mail classifier that classifies e-mail as it enters the e-mail system. The parameters of the global e-mail classifier generally may be determined by the policies of e-mail system owner and generally are set to only classify as spam those e-mails that are likely to be considered spam by a significant number of users of the e-mail system. Another stage includes personal e-mail classifiers at the individual mailboxes of the e-mail system users. The parameters of the personal e-mail classifiers generally are set by the users through retraining, such that the personal e-mail classifiers are refined to track the subjective perceptions of their respective user as to what e-mails are spam e-mails. Retraining data for the personal e-mail classifiers may be aggregated and a subset of the aggregate may be chosen for use in retraining the global e-mail classifier.

Citations

24 Claims

1. A method of handling messages in a messaging system that includes a message gateway and individual message boxes for users of the system, wherein a message addressed to a user is delivered to the user'"'"'s message box after passing through the message gateway, the method comprising:
- knowingly biasing a global, scoring e-mail classifier relative to a personal, scoring e-mail classifier such that the global, scoring e-mail classifier is less stringent than the personal, scoring e-mail classifier as to what is classified as spam, wherein the global, scoring e-mail classifier and the personal, scoring e-mail classifier are probabilistic e-mail classifiers such that, to classify a message, the global, scoring e-mail classifier and the personal, scoring e-mail classifier use respective internal models to determine a probability measure for the message and compare the probability measure to a classification threshold;
  
  receiving messages at the message gateway;
  
  inputting the received messages into the global, scoring e-mail classifier to classify the input messages as spam or non-spam;
  
  handling at least one of the messages input into the global, scoring e-mail classifier based on whether the global, scoring e-mail classifier classified the at least one message as spam or non-spam;
  
  outputting at least one message from the global, scoring e-mail classifier, wherein the outputted message has been classified as non-spam by the global, scoring e-mail classifier;
  
  inputting the outputted message from the global, scoring e-mail classifier into the personal, scoring e-mail classifier to classify the at least one outputted message as spam or non-spam;
  
  handling the at least one outputted message input into the personal, scoring e-mail classifier based on whether the personal, scoring e-mail classifier classified the at least one outputted message as spam or non-spam;
  
  receiving an indication from a user to change the classification of the at least one outputted message;
  
  in response to the indication, changing the classification of the at least one outputted message;
  
  generating retraining data based on the change to the classification of the at least one outputted message; and
  
  retraining the personal, scoring e-mail classifier based on the generated retraining data such that the personal, scoring e-mail classifier'"'"'s internal model is refined to track the user'"'"'s subjective perceptions as to what messages constitute spam messages.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
- - 2. The method according to claim 1, further comprising training the global, scoring e-mail classifier using a training set of messages to develop the internal model of the global, scoring e-mail classifier.
  - 3. The method according to claim 2, wherein the training set of messages comprises messages that are known to be spam messages to a significant number of users of the messaging system.
  - 4. The method according to claim 3, further comprising collecting the training set of messages through feedback from the users of the messaging system.
  - 5. The method according to claim 1, wherein to knowingly bias the global, scoring e-mail classifier related to the personal, scoring e-mail classifier, the global, scoring e-mail classifier is trained based on higher misclassification costs than the personal, scoring e-mail classifier.
  - 6. The method according to claim 1, wherein the messages are at least one of e-mails, instant messages, or SMS messages.
  - 7. The method according to claim 1, wherein the global, scoring e-mail classifier is configured such that classifying messages as spam or non-spam comprises classifying messages into subcategories of spam or non-spam.
  - 8. The method according to claim 1, wherein the personal, scoring e-mail classifier is configured such that classifying messages as spam or non-spam comprises classifying messages into subcategories of spam or non-spam.

9. A non-transitory computer-readable medium including a set of instructions which, when executed by a processor, performs a method of handling messages in a messaging system that includes a message gateway and individual message boxes for users of the system, wherein a message addressed to a user is delivered to the user'"'"'s message box after passing through the message gateway, the method comprising:
- knowingly biasing a global, scoring e-mail classifier relative to a personal, scoring e-mail classifier such that the global, scoring e-mail classifier is less stringent than the personal, scoring e-mail classifier as to what is classified as spam, wherein the global, scoring e-mail classifier and the personal, scoring e-mail classifier are probabilistic e-mail classifiers such that, to classify a message, the global, scoring e-mail classifier and the personal, scoring e-mail classifier use respective internal models to determine a probability measure for the message and compare the probability measure to a classification threshold;
  
  receiving messages at the message gateway;
  
  inputting the received messages into the global, scoring e-mail classifier to classify the input messages as spam or non-spam;
  
  handling at least one of the messages input into the global, scoring e-mail classifier based on whether the global, scoring e-mail classifier classified the at least one message as spam or non-spam;
  
  outputting at least one message from the global, scoring e-mail classifier, wherein the outputted message has been classified as non-spam by the global, scoring e-mail classifier;
  
  inputting the outputted message from the global, scoring e-mail classifier into the personal, scoring e-mail classifier to classify the at least one outputted message as spam or non-spam;
  
  handling the at least one outputted message input into the personal, scoring e-mail classifier based on whether the personal, scoring e-mail classifier classified the at least one outputted message as spam or non-spam;
  
  receiving an indication from a user to change the classification of the at least one outputted message;
  
  in response to the indication, changing the classification of the at least one outputted message;
  
  generating retraining data based on the change to the classification of the at least one outputted message; and
  
  retraining the personal, scoring e-mail classifier based on the generated retraining data such that the personal, scoring e-mail classifier'"'"'s internal model is refined to track the user'"'"'s subjective perceptions as to what messages constitute spam messages.
- View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
- - 10. The non-transitory computer-readable medium according to claim 9, further comprising instructions executable by the processor to perform a method including:
    - training the global, scoring e-mail classifier using a training set of messages to develop the internal model of the global, scoring e-mail classifier.
  - 11. The non-transitory computer-readable medium according to claim 10, wherein the training set of messages comprises messages that are known to be spam messages to a significant number of users of the messaging system.
  - 12. The non-transitory computer-readable medium according to claim 11, further comprising instructions executable by the processor to perform a method including:
    - collecting the training set of messages through feedback from the users of the messaging system.
  - 13. The non-transitory computer-readable medium according to claim 9, wherein to knowingly bias the global, scoring e-mail classifier related to the personal, scoring e-mail classifier, the global, scoring e-mail classifier is trained based on higher misclassification costs than the personal, scoring e-mail classifier.
  - 14. The non-transitory computer-readable medium according to claim 9, wherein the messages are at least one of e-mails, instant messages, or SMS messages.
  - 15. The non-transitory computer-readable medium according to claim 9, wherein the global, scoring e-mail classifier is configured such that classifying messages as spam or non-spam comprises classifying messages into subcategories of spam or non-spam.
  - 16. The non-transitory computer-readable medium according to claim 9, wherein the personal, scoring e-mail classifier is configured such that classifying messages as spam or non-spam comprises classifying messages into subcategories of spam or non-spam.

17. A system for handling messages in a messaging system that includes a message gateway and individual message boxes for users of the system, wherein a message addressed to a user is delivered to the user'"'"'s message box after passing through the message gateway, the system comprising:
- a storage medium that stores a set of instructions;
  
  at least one processor that executes the set of instructions to perform a method, the method comprising;
  
  knowingly biasing a global, scoring e-mail classifier relative to a personal, scoring e-mail classifier such that the global, scoring e-mail classifier is less stringent than the personal, scoring e-mail classifier as to what is classified as spam, wherein the global, scoring e-mail classifier and the personal, scoring e-mail classifier are probabilistic e-mail classifiers such that, to classify a message, the global, scoring e-mail classifier and the personal, scoring e-mail classifier use respective internal models to determine a probability measure for the message and compare the probability measure to a classification threshold;
  
  receiving messages at the message gateway;
  
  inputting the received messages into the global, scoring e-mail classifier to classify the input messages as spam or non-spam;
  
  handling at least one of the messages input into the global, scoring e-mail classifier based on whether the global, scoring e-mail classifier classified the at least one message as spam or non-spam;
  
  outputting at least one message from the global, scoring e-mail classifier, wherein the outputted message has been classified as non-spam by the global, scoring e-mail classifier;
  
  inputting the outputted message from the global, scoring e-mail classifier into the personal, scoring e-mail classifier to classify the at least one outputted message as spam or non-spam;
  
  handling the at least one outputted message input into the personal, scoring e-mail classifier based on whether the personal, scoring e-mail classifier classified the at least one outputted message as spam or non-spam;
  
  receiving an indication from a user to change the classification of the at least one outputted message;
  
  in response to the indication, changing the classification of the at least one outputted message;
  
  generating retraining data based on the change to the classification of the at least one outputted message; and
  
  retraining the personal, scoring e-mail classifier based on the generated retraining data such that the personal, scoring e-mail classifier'"'"'s internal model is refined to track the user'"'"'s subjective perceptions as to what messages constitute spam messages.
- View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
- - 18. The system according to claim 17, wherein the at least one processor further executes the set of instructions to perform a method including:
    - training the global, scoring e-mail classifier using a training set of messages to develop the internal model of the global, scoring e-mail classifier.
  - 19. The system according to claim 18, wherein the training set of messages comprises messages that are known to be spam messages to a significant number of users of the messaging system.
  - 20. The system according to claim 19, wherein the at least one processor further executes the set of instructions to perform a method including:
    - collecting the training set of messages through feedback from the users of the messaging system.
  - 21. The system according to claim 17, wherein to knowingly bias the global, scoring e-mail classifier related to the personal, scoring e-mail classifier, the global, scoring e-mail classifier is trained based on higher misclassification costs than the personal, scoring e-mail classifier.
  - 22. The system according to claim 17, wherein the messages are at least one of e-mails, instant messages, or SMS messages.
  - 23. The system according to claim 17, wherein the global, scoring e-mail classifier is configured such that classifying messages as spam or non-spam comprises classifying messages into subcategories of spam or non-spam.
  - 24. The system according to claim 17, wherein the personal, scoring e-mail classifier is configured such that classifying messages as spam or non-spam comprises classifying messages into subcategories of spam or non-spam.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Verizon Media, Inc. (Verizon Communications Inc.), Yahoo Assets LLC
Original Assignee
AOL Inc. (Apollo Global Management, Inc.)
Inventors
Alspector, Joshua, Kolcz, Aleksander
Primary Examiner(s)
Bengzon, Greg C
Assistant Examiner(s)
Gupta, Muktesh G

Application Number

US10/743,015
Time in Patent Office

3,115 Days
Field of Search

709/224, 709/206, 709/207
US Class Current

709/206
CPC Class Codes

G06F 16/353   into predefined classes

G06Q 10/107   Computer-aided management o...

H04L 51/212   using filtering or selectiv...

H04L 51/48   Message addressing, e.g. ad...

Online adaptive filtering of messages

First Claim

12 Assignments

0 Petitions

Accused Products

Abstract

Citations

24 Claims

Specification

Solutions

Use Cases

Quick Links

Online adaptive filtering of messages

First Claim

12 Assignments

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

Citations

24 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links