Learning framework for online applications
First Claim
1. A computer implemented method for detecting spam messages,comprising:
- determining a first stage probability of whether a received message is a spam message, wherein the first stage probability is determined by evaluating the received message in relation to a subset of test messages, wherein each subset test message in the subset of test messages was previously identified as either valid or spam;
receiving an indication that a first stage classifier is unsure, based on the first stage probability, as to whether the received message is a spam message;
determining that the first stage probability is greater than a lower limit for combining probabilities and is less than an upper limit for combining probabilities, wherein the lower limit for combining probabilities indicates a probability value below which the first stage probability will not be combined with a second stage probability to determine whether the received message is a spam message, and wherein the upper limit for combining probabilities indicates a probability value above which the received message is marked as a spam message without combining the first stage probability with the second stage probability;
determining a second stage probability of whether the received message is a spam message, wherein the second stage probability is determined by evaluating the received message in relation to a subset-specific master set of test messages, which includes the subset of test messages, wherein each subset-specific master set test message in the subset-specific master set of test messages was previously identified as either valid or spam;
computing a combined probability based on the first stage probability and the second stage probability;
determining that the combined probability is greater than a threshold probability at which a threshold classification ratio is highest, wherein the classification ratio comprises a ratio of correctly identified spam messages over incorrectly identified spam messages.
9 Assignments
0 Petitions
Accused Products
Abstract
Learning to, and detecting spam messages using a multi-stage combination of probability calculations based on individual and aggregate training sets of previously identified messages. During a preliminary phase, classifiers are trained, lower and upper limit probabilities, and a combined probability threshold are iteratively determined using a multi-stage combination of probability calculations based on minor and major subsets of messages previously categorized as valid or spam. During a live phase, a first stage classifier uses only a particular subset, and a second stage classifier uses a master set of previously categorized messages. If a newly received message can not be categorized with certainty by the first stage classifier, and a computed first stage probability is within the previously determined lower and upper limits, first and second stage probabilities are combined. If the combined probability is greater than the previously determined combined probability threshold, the received message is marked as spam.
23 Citations
20 Claims
-
1. A computer implemented method for detecting spam messages,
comprising: -
determining a first stage probability of whether a received message is a spam message, wherein the first stage probability is determined by evaluating the received message in relation to a subset of test messages, wherein each subset test message in the subset of test messages was previously identified as either valid or spam; receiving an indication that a first stage classifier is unsure, based on the first stage probability, as to whether the received message is a spam message; determining that the first stage probability is greater than a lower limit for combining probabilities and is less than an upper limit for combining probabilities, wherein the lower limit for combining probabilities indicates a probability value below which the first stage probability will not be combined with a second stage probability to determine whether the received message is a spam message, and wherein the upper limit for combining probabilities indicates a probability value above which the received message is marked as a spam message without combining the first stage probability with the second stage probability; determining a second stage probability of whether the received message is a spam message, wherein the second stage probability is determined by evaluating the received message in relation to a subset-specific master set of test messages, which includes the subset of test messages, wherein each subset-specific master set test message in the subset-specific master set of test messages was previously identified as either valid or spam; computing a combined probability based on the first stage probability and the second stage probability; determining that the combined probability is greater than a threshold probability at which a threshold classification ratio is highest, wherein the classification ratio comprises a ratio of correctly identified spam messages over incorrectly identified spam messages. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for detecting spam messages, comprising:
-
a processor; a communication interface in communication with the processor and in communication with an electronic network; a memory in communication with the processor and storing computer readable instructions that cause the processor to perform a plurality of operations, including; determining a first stage probability of whether a received message is a spam message, wherein the first stage probability is determined by evaluating the received message in relation to a subset of test messages, wherein each subset test message in the subset of test messages was previously identified as either valid or spam; receiving an indication that a first stage classifier is unsure, based on the first stage probability, as to whether the received message is an spam message; determining that the first stage probability is greater than a lower limit for combining probabilities and is less than an upper limit for combining probabilities, wherein the lower limit for combining probabilities indicates a probability value below which the first stage probability will not be combined with a second stage probability to determine whether the received message is a spam message, and wherein the upper limit for combining probabilities indicates a probability value above which the received message is marked as a spam message without combining the first stage probability with the second stage probability; determining a second stage probability of whether the received message is a spam message, wherein the second stage probability is determined by evaluating the received message in relation to a master set of test messages, which includes the subset of test messages, wherein each master set test message in the master set of test messages was previously identified as either valid or spam; computing a combined probability based on the first stage probability and the second stage probability; determining that the combined probability is greater than a threshold probability at which a threshold classification ratio is highest, wherein the classification ratio comprises a ratio of correctly identified spam messages over incorrectly identified spam messages. - View Dependent Claims (13, 14, 15, 16, 17, 18)
-
-
19. A computer implemented method for detecting spam messages, comprising:
-
receiving an indication that a first stage classifier is unsure as to whether a received message is an spam message; determining that a first stage probability is greater than a lower limit and less than an upper limit, wherein the first stage probability is determined from a user set of messages, each of which was identified by one user as either valid or spam; determining a second stage probability of the received message being a spam message, wherein the second stage probability is determined from a multiple user set of messages, each of which was identified by as either valid or spam by multiple users; computing a combined probability based on the first stage probability and a second stage probability; determining that the combined probability is greater than a threshold probability that is determined from the lower limit. - View Dependent Claims (20)
-
Specification