Systems and methods for spam detection using frequency spectra of character strings
First Claim
1. A method comprising:
- employing at least one processor of a computer system to receive a target string forming a part of an electronic communication;
employing the at least one processor to process the target string into a target signal consisting of a sequence of numbers, wherein each character of the target string is mapped to a number of the sequence of numbers;
in response to receiving the target string, employing the at least one processor to determine a string eligibility criterion according to the target string;
employing the at least one processor to pre-filter a corpus of reference strings according to the string eligibility criterion, to produce a plurality of candidate strings;
in response to selecting the candidate strings, employing the at least one processor to perform a comparison between a frequency spectrum of the target signal and a frequency spectrum determined for a candidate string of the plurality of candidate strings, wherein determining the frequency spectrum of the target signal comprises applying a Fourier transform to the target signal to represent the target signal as a plurality of frequency components of the target signal, each frequency component having a distinct frequency and an amplitude determined for the distinct frequency; and
employing the at least one processor to determine whether the electronic communication is spam or non-spam according to a result of the comparison.
2 Assignments
0 Petitions
Accused Products
Abstract
Described spam detection techniques including string identification, pre-filtering, and frequency spectrum and timestamp comparison steps facilitate accurate, computationally-efficient detection of rapidly-changing spam arriving in short-lasting waves. In some embodiments, a computer system extracts a target character string from an electronic communication such as a blog comment, transmits it to an anti-spam server, and receives an indicator of whether the respective electronic communication is spam or non-spam from the anti-spam server. The anti-spam server determines whether the electronic communication is spam or non-spam according to features of the frequency spectrum of the target string. Some embodiments also perform an unsupervised clustering of incoming target strings into clusters, wherein all members of a cluster have similar spectra.
42 Citations
29 Claims
-
1. A method comprising:
-
employing at least one processor of a computer system to receive a target string forming a part of an electronic communication; employing the at least one processor to process the target string into a target signal consisting of a sequence of numbers, wherein each character of the target string is mapped to a number of the sequence of numbers; in response to receiving the target string, employing the at least one processor to determine a string eligibility criterion according to the target string; employing the at least one processor to pre-filter a corpus of reference strings according to the string eligibility criterion, to produce a plurality of candidate strings; in response to selecting the candidate strings, employing the at least one processor to perform a comparison between a frequency spectrum of the target signal and a frequency spectrum determined for a candidate string of the plurality of candidate strings, wherein determining the frequency spectrum of the target signal comprises applying a Fourier transform to the target signal to represent the target signal as a plurality of frequency components of the target signal, each frequency component having a distinct frequency and an amplitude determined for the distinct frequency; and employing the at least one processor to determine whether the electronic communication is spam or non-spam according to a result of the comparison. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer system comprising at least one processor programmed to:
-
receive a target string forming a part of an electronic communication; process the target string into a target signal consisting of a sequence of numbers, wherein each character of the target string is mapped to a number of the sequence of numbers; in response to receiving the target string, determine a string eligibility criterion according to the target string; pre-filter a corpus of reference strings according to the string eligibility criterion, to produce a plurality of candidate strings; in response to selecting the candidate strings, perform a comparison between a frequency spectrum of the target signal and a frequency spectrum determined for a candidate string of the plurality of candidate strings, wherein determining the frequency spectrum to the target signal comprises applying a Fourier transform to the target signal to represent the target signal as a plurality of frequency components of the target signal, each frequency component having a distinct frequency and an amplitude determined for the distinct frequency; and determine whether the electronic communication is spam or non-spam according to a result of the comparison. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A method comprising:
-
employing at least one processor of a computer system to receive an electronic communication; in response to receiving the electronic communication, employing the at least one processor to extract a target string from the electronic communication; employing the at least one processor to transmit the target string to an anti-spam server; and in response to transmitting the target string, receiving a target label indicative of whether the electronic communication is spam or non-spam, wherein the target label is determined at the anti-spam server and wherein determining the target label comprises; employing the anti-spam server to process the target string of characters into a target signal consisting of a sequence of numbers, wherein each character of the target string is mapped to a number of the sequence of numbers; employing the anti-spam server to determine an eligibility criterion according to the target string; employing the anti-spam server to pre-filter a corpus of reference strings according to the eligibility criterion, to produce a plurality of candidate strings; in response to selecting the candidate strings, employing the anti-spam server to employing the computer system to perform a comparison between a frequency spectrum of the target signal and a frequency spectrum determined for a candidate string of the plurality of candidate strings, wherein determining the frequency spectrum of the target signal comprises applying a Fourier transform to the target signal to represent the target signal as a plurality of frequency components of the target signal, each frequency component having a distinct frequency and an amplitude determined for the distinct frequency; and employing the anti-spam server to determine whether the electronic communication is spam or non-spam according to a result of the comparison.
-
-
28. A method comprising:
-
employing at least one processor of a computer system to receive a target string forming a part of an electronic communication; employing the at least one processor to process the target string into a target signal consisting of a sequence of numbers, wherein each character of the target string is mapped to a number of the sequence of numbers; in response to receiving the target string, employing the at least one processor to determine a string eligibility criterion according to the target string; employing the at least one processor to pre-filter a corpus of reference strings according to the string eligibility criterion, to produce a plurality of candidate strings; in response to selecting the candidate strings, employing the at least one processor to determine an inter-string distance separating the target string from a candidate string of the plurality of candidate strings, the inter-string distance determined according to a first amplitude of a frequency spectrum of the target signal and according to a second amplitude of a frequency spectrum determined for the candidate string, wherein determining the frequency spectrum of the target signal comprises applying a Fourier transform to the target signal to represent the target signal as a plurality of frequency components of the target signal, each frequency component having a distinct frequency and an amplitude determined for the distinct frequency; and employing the at least one processor to determine whether the target communication is spam or non-spam according to the inter-string distance.
-
-
29. A method comprising:
-
employing at least one processor of a computer system to receive a target string forming a part of an electronic communication; employing the at least one processor to process the target string into a target signal consisting of a sequence of numbers, wherein each character of the target string is mapped to a number of the sequence of numbers; employing the at least one processor to perform a comparison between a frequency spectrum of the target signal and a frequency spectrum determined for a reference string selected from a set of reference strings, wherein determining the frequency spectrum of the target signal comprises applying a Fourier transform to the target signal to represent the target signal as a plurality of frequency components of the target signal, each frequency component having a distinct frequency and an amplitude determined for the distinct frequency; and employing the at least one processor to determine whether the target communication is spam or non-spam according to a result of the comparison.
-
Specification