Image spam filtering based on senders' intention analysis
First Claim
1. A computer-implemented method comprising:
- converting, by an anti-spam module, an embedded image of an electronic mail (email) message to a binarized representation by performing thresholding on a grayscale representation of the embedded image;
quantifying, by the anti-spam module, a number of text strings that are included in the embedded image by analyzing one or more blocks of the binarized representation with a text string measurement algorithm;
classifying, by the anti-spam module, the email message as spam or clean based at least in part on the number of text strings;
wherein the anti-spam module is implemented in one or more processors and one or more non-transitory computer-readable storage media of one or more computer systems, the one or more non-transitory computer-readable storage media having instructions tangibly embodied therein representing the anti-spam module that are executable by the one or more processors; and
wherein the one or more blocks comprise M×
N virtual blocks and wherein the text string measurement algorithm employs equations having a general form as follows;
0 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for an anti-spam detection module that can detect image spam are provided. According to one embodiment, an image spam detection process involves determining and measuring various characteristics of images that may be embedded within or otherwise associated with an electronic mail (email) message. An approximate display location of the embedded images is determined. The existence of one or more abnormal factors associated with the embedded images is identified. A quantity of text included in the one or more embedded images is determined and measured by analyzing one or more blocks of binarized representations of the one or more embedded images. Finally, the likelihood that the email message is spam is determined based on one or more of the approximate display location, the existence of one or more abnormal factors and the quantity and location of text measured.
35 Citations
41 Claims
-
1. A computer-implemented method comprising:
-
converting, by an anti-spam module, an embedded image of an electronic mail (email) message to a binarized representation by performing thresholding on a grayscale representation of the embedded image; quantifying, by the anti-spam module, a number of text strings that are included in the embedded image by analyzing one or more blocks of the binarized representation with a text string measurement algorithm; classifying, by the anti-spam module, the email message as spam or clean based at least in part on the number of text strings; wherein the anti-spam module is implemented in one or more processors and one or more non-transitory computer-readable storage media of one or more computer systems, the one or more non-transitory computer-readable storage media having instructions tangibly embodied therein representing the anti-spam module that are executable by the one or more processors; and wherein the one or more blocks comprise M×
N virtual blocks and wherein the text string measurement algorithm employs equations having a general form as follows; - View Dependent Claims (2, 3, 4, 5)
-
-
6. A computer-implemented method comprising:
-
determining, by an anti-spam module, an approximate display location of one or more embedded images within an electronic mail (email) message; identifying, by the anti-spam module, existence of one or more abnormal factors associated with the one or more embedded images; quantifying, by the anti-spam module, a number of text strings that are included in the embedded image by analyzing one or more blocks of the binarized representation with a text string measurement algorithm; classifying, by the anti-spam module, the email message as spam or clean based on the approximate display location, the existence of one or more abnormal factors and the number of text strings; wherein the anti-spam module is implemented in one or more processors and one or more non-transitory computer-readable storage media of one or more computer systems, the one or more non-transitory computer-readable storage media having instructions tangibly embodied therein representing the anti-spam module that are executable by the one or more processors; and wherein the one or more blocks comprise M×
N virtual blocks and wherein the text string measurement algorithm employs equations having a general form as follows; - View Dependent Claims (7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A non-transitory computer-readable storage medium having tangibly embodied thereon instructions, which when executed by one or more processors of one or more computer systems cause the one or more processors to perform a method comprising:
-
converting an embedded image of an electronic mail (email) message to a binarized representation by performing thresholding on a grayscale representation of the embedded image; quantifying, by the anti-spam module, a number of text strings that are included in the embedded image by analyzing one or more blocks of the binarized representation with a text string measurement algorithm; classifying, by the anti-spam module, the email message as spam or clean based at least in part on the number of text strings; and wherein the one or more blocks comprise M×
N virtual blocks and wherein the text string measurement algorithm employs equations having a general form as follows; - View Dependent Claims (20, 21, 22, 23)
-
-
24. A non-transitory computer-readable storage medium having tangibly embodied thereon instructions, which when executed by one or more processors of one or more computer systems cause the one or more processors to perform a method comprising:
-
determining an approximate display location of one or more embedded images within an electronic mail (email) message; identifying existence of one or more abnormal factors associated with the one or more embedded images; quantifying, by the anti-spam module, a number of text strings that are included in the embedded image by analyzing one or more blocks of the binarized representation with a text string measurement algorithm; and classifying, by the anti-spam module, the email message as spam or clean based on the approximate display location, the existence of one or more abnormal factors and the number of text strings; and wherein the one or more blocks comprise M×
N virtual blocks and wherein the text string measurement algorithm employs equations having a general form as follows; - View Dependent Claims (25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36)
-
-
37. An electronic mail (email) security system comprising:
-
a non-transitory storage device having tangibly embodied thereon instructions associated with an anti-spam module; and one or more processors coupled to the non-transitory storage device and operable to execute the instructions associated with the anti-spam module to perform a method comprising; converting an embedded image of an email message to a binarized representation by performing thresholding on a grayscale representation of the embedded image; quantifying a number of text strings that are included in the embedded image by analyzing one or more blocks of the binarized representation with a text string measurement algorithm; classifying the email message as spam or clean based at least in part on the number of text strings; and wherein the one or more blocks comprise M×
N virtual blocks and wherein the text string measurement algorithm employs equations having a general form as follows; - View Dependent Claims (38, 39, 40, 41)
-
Specification