Feedback loop for spam prevention
First Claim
1. A cross-validation method that facilitates verifying reliability and trustworthiness of user classifications comprising:
- excluding one or more suspected user'"'"'s classifications from data employed to train a spam filter, performed at a server side, based on the user incorrectly classifying email as spam or as not spam;
training the spam filter using all other available user classifications;
running the suspected user'"'"'s polling messages through the trained spam filter to determine how it would have classified the messages compared to the suspected user'"'"'s classifications;
validating users trustworthiness to classify email based on their classification of incoming email as spam that is known or subsequently determined to be spam or their classification of incoming email as not spam that is known or subsequently determined to not be spam; and
discounting existing and future classifications provided by users who are determined to be untrustworthy until the users are determined to be trustworthy.
2 Assignments
0 Petitions
Accused Products
Abstract
The subject invention provides for a feedback loop system and method that facilitate classifying items in connection with spam prevention in server and/or client-based architectures. The invention makes uses of a machine-learning approach as applied to spam filters, and in particular, randomly samples incoming email messages so that examples of both legitimate and junk/spam mail are obtained to generate sets of training data. Users which are identified as spam-fighters are asked to vote on whether a selection of their incoming email messages is individually either legitimate mail or junk mail. A database stores the properties for each mail and voting transaction such as user information, message properties and content summary, and polling results for each message to generate training data for machine learning systems. The machine learning systems facilitate creating improved spam filter(s) that are trained to recognize both legitimate mail and spam mail and to distinguish between them.
-
Citations
14 Claims
-
1. A cross-validation method that facilitates verifying reliability and trustworthiness of user classifications comprising:
-
excluding one or more suspected user'"'"'s classifications from data employed to train a spam filter, performed at a server side, based on the user incorrectly classifying email as spam or as not spam; training the spam filter using all other available user classifications; running the suspected user'"'"'s polling messages through the trained spam filter to determine how it would have classified the messages compared to the suspected user'"'"'s classifications; validating users trustworthiness to classify email based on their classification of incoming email as spam that is known or subsequently determined to be spam or their classification of incoming email as not spam that is known or subsequently determined to not be spam; and discounting existing and future classifications provided by users who are determined to be untrustworthy until the users are determined to be trustworthy. - View Dependent Claims (2)
-
-
3. A computer readable storage media encoded with a computer program for a system that facilitates classifying items in connection with spam prevention, the computer program comprising:
-
a component that receives a set of the items; a component that identifies intended recipients of the items, and tags a subset of the items to be polled, the subset of items corresponding to a subset of recipients that are known spam fighting users; and a feedback component that receives information relating to the spam fighting users'"'"' classification of the polled items, and employs the information in connection with training a spam filter and populating a spam list unless one or more of the spam fighting users have been determined untrustworthy. - View Dependent Claims (4, 5, 6, 7, 8, 9, 10, 11, 12)
-
-
13. A server implementing a system that facilitates classifying items in connection with spam prevention, comprising:
-
a processor for executing a computer program encoded in a memory; the memory encoded with the computer program, the computer program comprising; a component that receives asset of the items; a component that identifies intended recipients of the items, and tags a subset of the items to be polled, the subset of items corresponding to a subset of recipients that are known spam fighting users; and a feedback component that receives information relating to the spam fighting users'"'"' classification of the polled items, and employs the information in connection with training a spam filter and populating a spam list unless the spam fighting users have been determined untrustworthy.
-
-
14. An email architecture employing a system that facilitates classifying items in connection with spam prevention, comprising:
-
a processor for executing a computer program encoded in a memory; the memory encoded with the computer program, the computer program comprising; a component that receives asset of the items; a component that identifies intended recipients of the items, and tags a subset of the items to be polled, the subset of items corresponding to a subset of recipients that are known spam fighting users; and a feedback component that receives information relating to the spam fighting users'"'"' classification of the polled items, and employs the information in connection with training a spam filter and populating a spam list unless the spam fighting users have been determined untrustworthy.
-
Specification