Filter for blocking image-based spam
First Claim
1. A method for use in managing delivery of content over a network, comprising:
- receiving a message, wherein the message includes an image file;
extracting the image file from the message;
generating a signature vector from the image file, wherein the signature vector includes at least low frequency bits and intensity bits determined from the image file;
determining a weighting vector using a machine learning mechanism on a plurality of known other image files;
performing a weighted min-hash near duplicate detection (NDD) using the weighting vector to determine if the signature vector indicates that the image file is likely to be a spam image; and
based on a result of the weighted min-hash NDD, selectively blocking the image file from being delivered to a destination.
9 Assignments
0 Petitions
Accused Products
Abstract
A network device and method are directed towards detecting and blocking image spam within a message by employing a weighted min-hash to perform a near duplicate detection (NDD) of determined features within an image as compared to known spam images. The weighting for the min-hash is determined based on employing a machine learning algorithm, such as a perceptron, to identify an importance of each bit in a signature vector of the image. The signature vector is generated by extracting a shape of text in the image using a Discrete Cosine Transform, extracting low-frequency characteristics using a high-pass filter, and then performing various morphological operations to emphasize the shape of the text and reduce noise. Selected feature bits are extracted from the lowest frequency and intensity bits of the resulting signal to generate the signature vector used in the weighted min-hash NDD.
21 Citations
20 Claims
-
1. A method for use in managing delivery of content over a network, comprising:
-
receiving a message, wherein the message includes an image file; extracting the image file from the message; generating a signature vector from the image file, wherein the signature vector includes at least low frequency bits and intensity bits determined from the image file; determining a weighting vector using a machine learning mechanism on a plurality of known other image files; performing a weighted min-hash near duplicate detection (NDD) using the weighting vector to determine if the signature vector indicates that the image file is likely to be a spam image; and based on a result of the weighted min-hash NDD, selectively blocking the image file from being delivered to a destination. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A network device for selectively managing delivery of messages over a network, comprising:
-
a transceiver to send and receive data over the network; and a processor that is operative to perform actions, including; receiving an image file associated with a message; if the physical characteristic indicates that statistically, the image file is unlikely to be associated with image spam, enabling the image file and message to be forwarded to a destination, otherwise performing the following actions, comprising; generating a signature vector from the image file, wherein the signature vector includes at least low frequency bits and intensity bits from the image file; determining a weighting vector using a machine learning mechanism on a plurality of known other image files; performing a weighted min-hash near duplicate detection (NDD) using the weighting vector to determine if the signature vector indicates that the image file is likely to be a spam image; and based on a result of the weighted min-hash NDD selectively blocking the image file from being delivered to a destination. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system for use in selectively enabling delivery of content over a network, comprising:
-
a message server that is configured and arranged to perform actions, including; receiving a message; and if the message includes an image file, providing the image file to an image spam detection component; and the image spam detection component being configured to perform actions, including; generating a signature vector from the image file, wherein the signature vector includes at least low frequency bits and intensity bits from the image file; determining a weighting vector using a machine learning mechanism on a plurality of known other image files; performing a weighted min-hash near duplicate detection (NDD) using the weighting vector to determine if the signature vector indicates that the image file is likely to be a spam image; and based on a result of the weighted min-hash NDD selectively blocking the image file from being delivered to a destination. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification