Using distinguishing properties to classify messages
First Claim
Patent Images
1. A method for e-mail message classification, the method comprising:
- receiving from a recipient of an e-mail message an indication that the e-mail message is spam;
extracting content from the e-mail message before deleting the e-mail message from an inbox of the recipient, wherein the extracted content excludes information that does not distinguish the e-mail message;
generating a plurality of signatures based on distinguishing properties identified in the extracted content, and wherein the distinguishing properties exclude;
variable portions of contact information related to a sender of the e-mail message,contact information lacking a required character, andcontact information including a forbidden character;
storing the plurality of signatures as a combination of signatures in a database of signatures;
tracking a number of times the combination of signatures appears in the database of signatures;
updating the database of signatures based on the tracked number of times the combination of signatures appears;
determining whether the tracked number of times the combination of signatures appears exceeds a predetermined threshold; and
classifying a subsequently received e-mail message as spam when a set of signatures based on the subsequently received e-mail message matches all of the stored combination of signatures for a previously received spam message and the tracked number of times the combination of signatures appears exceeds the predetermined threshold.
23 Assignments
0 Petitions
Accused Products
Abstract
A system and method are disclosed for classifying a message. The method includes receiving the message, identifying in the message a distinguishing property; generating a signature using the distinguishing property; and comparing the signature to a database of signatures generated by previously classified messages.
126 Citations
16 Claims
-
1. A method for e-mail message classification, the method comprising:
-
receiving from a recipient of an e-mail message an indication that the e-mail message is spam; extracting content from the e-mail message before deleting the e-mail message from an inbox of the recipient, wherein the extracted content excludes information that does not distinguish the e-mail message; generating a plurality of signatures based on distinguishing properties identified in the extracted content, and wherein the distinguishing properties exclude; variable portions of contact information related to a sender of the e-mail message, contact information lacking a required character, and contact information including a forbidden character; storing the plurality of signatures as a combination of signatures in a database of signatures; tracking a number of times the combination of signatures appears in the database of signatures; updating the database of signatures based on the tracked number of times the combination of signatures appears; determining whether the tracked number of times the combination of signatures appears exceeds a predetermined threshold; and classifying a subsequently received e-mail message as spam when a set of signatures based on the subsequently received e-mail message matches all of the stored combination of signatures for a previously received spam message and the tracked number of times the combination of signatures appears exceeds the predetermined threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable storage medium, having embodied thereon a program, the program being executable by a processor to perform a method for e-mail message classification, the method comprising:
-
receiving from a recipient of an e-mail message an indication that the e-mail message is spam; extracting content from the e-mail message before deleting the e-mail message from an inbox of the recipient, wherein the extracted content excludes information that does not distinguish the e-mail message; generating a plurality of signatures based on distinguishing properties identified in the extracted content, and wherein the distinguishing properties exclude; variable portions of contact information related to a sender of the e-mail message, contact information lacking a required character, and contact information including a forbidden character; storing the plurality of signatures as a combination of signatures in a database of signatures; tracking a number of times a combination of signatures appears in the database of signatures; updating the database of signatures based on the tracked number of times the combination of signatures appears; determining whether the tracked number of times the combination of signatures appears exceeds a predetermined threshold; and classifying a subsequently received e-mail message as spam when a set of signatures based on the subsequently received e-mail message matches all of the stored combination of signatures for a previously received spam message and the tracked number of times the combination of signatures appears exceeds the predetermined threshold. - View Dependent Claims (16)
-
Specification