Group based spam classification
First Claim
1. A method of classifying e-mail as spam, the method comprising:
- clustering received e-mails into groups of substantially similar e-mails;
selecting a set of one or more test e-mails from at least one of the groups, wherein a proportion of the e-mails in the set are spam e-mails;
determining the proportion of spam e-mails in the set of test e-mails;
classifying the e-mails in the at least one group as spam when the proportion of spam e-mails in the set exceeds a predetermined threshold proportion.
10 Assignments
0 Petitions
Accused Products
Abstract
An e-mail filter is used to classify received e-mails so that some of the classes may be filtered, blocked, or marked. The e-mail filter may include a classifier that can classify an e-mail as belonging to a particular class and an e-mail grouper that can detect substantially similar, but possibly not identical, e-mails. The e-mail grouper determines groups of substantially similar e-mails in an incoming e-mail stream. For each group, the classifier determines whether one or more test e-mails from the group belongs to the particular class. The classifier then designates the class to which the other e-mails in the group belong based on the results for the test e-mails.
-
Citations
43 Claims
-
1. A method of classifying e-mail as spam, the method comprising:
-
clustering received e-mails into groups of substantially similar e-mails;
selecting a set of one or more test e-mails from at least one of the groups, wherein a proportion of the e-mails in the set are spam e-mails;
determining the proportion of spam e-mails in the set of test e-mails;
classifying the e-mails in the at least one group as spam when the proportion of spam e-mails in the set exceeds a predetermined threshold proportion. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-usable medium having a computer program embodied thereon for classifying e-mail as spam, the computer program comprising instructions for causing a computer to perform the following operations:
-
cluster received e-mails into groups of substantially similar e-mails;
select a set of one or more test e-mails from at least one of the groups, wherein a proportion of the e-mails in the set are spam e-mails;
determine the proportion of spam e-mails in the set of test e-mails;
classify the e-mails in the at least one group as spam when the proportion of spam e-mails in the set exceeds a predetermined threshold proportion. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. An apparatus for classifying e-mail as spam, the apparatus comprising:
-
means for clustering received e-mails into groups of substantially similar e-mails;
means for selecting a set of one or more test e-mails from at least one of the groups, wherein a proportion of the e-mails in the set are spam e-mails;
means for determining the proportion of spam e-mails in the set of test e-mails;
means for classifying the e-mails in the at least one group as spam when the proportion of spam e-mails in the set exceeds a predetermined threshold proportion. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. A method of classifying e-mails, the method comprising:
-
clustering received e-mails into groups of substantially similar e-mails;
selecting one or more test e-mails from at least one of the groups;
determining a class for the one or more test e-mails;
classifying at least one non-test e-mail in the at least one group based on the determined class of the one or more test e-mails. - View Dependent Claims (23, 24, 25, 26, 27, 28, 29, 30, 31, 32)
-
-
33. A computer-usable medium having a computer program embodied thereon for classifying e-mails, the computer program comprising instructions for causing a computer to perform the following operations:
-
cluster received e-mails into groups of substantially similar e-mails;
select one or more test e-mails from at least one of the groups;
determine a class for the one or more test e-mails;
classify at least one non-test e-mail in the at least one group based on the determined class of the one or more test e-mails. - View Dependent Claims (34, 35, 36, 37, 38, 39, 40, 41, 42, 43)
-
Specification