Adaptive junk message filtering system
First Claim
1. A system that facilitates adaptive data filtering, comprising:
- a processor;
a memory communicatively coupled to the processor, the memory having stored therein computer-executable instructions configured to implement the data filtering system, including;
a first filter configured to label messages as junk based upon junk information associated with the messages, wherein the first filter is associated with a first accuracy rate;
a second filter configured to label the messages as junk based upon junk information associated with the messages, the second filter is initially associated with the first accuracy rate;
a filter output configured to receive labeled and unlabeled messages from the first filter and the second filter;
a user correction component configured to receive user actions overriding the initial labeling of the messages received at the filter output and calculate a first accuracy rate based upon the user actions; and
a filter control component configured totrain the second filter utilizing a threshold and the user actions, wherein if the probability that a message is junk exceeds the threshold, then the filter is trained to label the message as junk;
calculate a second accuracy rate for the second filter; and
route subsequently received messages to the second filter in lieu of the first filter if the second accuracy rate is better than the first accuracy rate.
2 Assignments
0 Petitions
Accused Products
Abstract
The invention relates to a system for filtering messages—the system includes a seed filter having associated therewith a false positive rate and a false negative rate. A new filter is also provided for filtering the messages, the new filter is evaluated according to the false positive rate and the false negative rate of the seed filter, the data used to determine the false positive rate and the false negative rate of the seed filter are utilized to determine a new false positive rate and a new false negative rate of the new filter as a function of threshold. The new filter is employed in lieu of the seed filter if a threshold exists for the new filter such that the new false positive rate and new false negative rate are together considered better than the false positive and the false negative rate of the seed filter.
-
Citations
10 Claims
-
1. A system that facilitates adaptive data filtering, comprising:
-
a processor; a memory communicatively coupled to the processor, the memory having stored therein computer-executable instructions configured to implement the data filtering system, including; a first filter configured to label messages as junk based upon junk information associated with the messages, wherein the first filter is associated with a first accuracy rate; a second filter configured to label the messages as junk based upon junk information associated with the messages, the second filter is initially associated with the first accuracy rate; a filter output configured to receive labeled and unlabeled messages from the first filter and the second filter; a user correction component configured to receive user actions overriding the initial labeling of the messages received at the filter output and calculate a first accuracy rate based upon the user actions; and a filter control component configured to train the second filter utilizing a threshold and the user actions, wherein if the probability that a message is junk exceeds the threshold, then the filter is trained to label the message as junk; calculate a second accuracy rate for the second filter; and route subsequently received messages to the second filter in lieu of the first filter if the second accuracy rate is better than the first accuracy rate. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method having stored computer-executable instructions that are executable on a processor that facilitates adaptive data filtering, the method comprising:
-
labeling messages by a first filter, as junk based upon junk information associated with the messages, wherein the first filter is associated with a first accuracy rate; labeling the messages by a second filter, as junk based upon junk information associated with the messages, a second filter is initially associated with the first accuracy rate; receiving by a filter output, labeled and unlabeled messages from the first filter and the second filter; receiving by a user correction component, user actions overriding the initial labeling of the messages received at the filter output and calculating a first accuracy rate based upon the user actions; and including a filter control component configured to; training the second filter utilizing a threshold and the user actions, wherein if a probability that a message is junk exceeds the threshold, then the filter is trained to label the message as junk; calculating a second accuracy rate for the second filter; and routing subsequently received messages to the second filter in lieu of the first filter if the second accuracy rate is better than the first accuracy rate; wherein the junk information includes at least one of sender information, source IP address, sender name, sender e-mail address, sender domain name, unintelligible alphanumeric strings in identifier fields, terms and phrases in message text, features in message text, or embedded links to pop-up advertisements.
-
Specification