Systems and Methods for Spam Detection Using Frequency Spectra of Character Strings
First Claim
1. A method comprising:
- employing a computer system to receive a target string forming a part of an electronic communication;
employing a computer system to process the target string into a target signal consisting of a sequence of numbers, wherein each character of the target string is mapped to a number of the sequence of numbers;
in response to receiving the target string, employing the computer system to determine a string eligibility criterion according to the target string;
employing the computer system to pre-filter a corpus of reference strings according to the string eligibility criterion, to produce a plurality of candidate strings;
in response to selecting the candidate strings, employing the computer system to perform a comparison between a frequency spectrum of the target signal and a frequency spectrum determined for a candidate string of the plurality of candidate strings; and
employing the computer system to determine whether the electronic communication is spam or non-spam according to a result of the comparison.
2 Assignments
0 Petitions
Accused Products
Abstract
Described spam detection techniques including string identification, pre-filtering, and frequency spectrum and timestamp comparison steps facilitate accurate, computationally-efficient detection of rapidly-changing spam arriving in short-lasting waves. In some embodiments, a computer system extracts a target character string from an electronic communication such as a blog comment, transmits it to an anti-spam server, and receives an indicator of whether the respective electronic communication is spam or non-spam from the anti-spam server. The anti-spam server determines whether the electronic communication is spam or non-spam according to features of the frequency spectrum of the target string. Some embodiments also perform an unsupervised clustering of incoming target strings into clusters, wherein all members of a cluster have similar spectra.
22 Citations
29 Claims
-
1. A method comprising:
-
employing a computer system to receive a target string forming a part of an electronic communication; employing a computer system to process the target string into a target signal consisting of a sequence of numbers, wherein each character of the target string is mapped to a number of the sequence of numbers; in response to receiving the target string, employing the computer system to determine a string eligibility criterion according to the target string; employing the computer system to pre-filter a corpus of reference strings according to the string eligibility criterion, to produce a plurality of candidate strings; in response to selecting the candidate strings, employing the computer system to perform a comparison between a frequency spectrum of the target signal and a frequency spectrum determined for a candidate string of the plurality of candidate strings; and employing the computer system to determine whether the electronic communication is spam or non-spam according to a result of the comparison. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computer system comprising at least a processor programmed to:
-
receive a target string forming a part of an electronic communication; process the target string into a target signal consisting of a sequence of numbers, wherein each character of the target string is mapped to a number of the sequence of numbers; in response to receiving the target string, determine a string eligibility criterion according to the target string; pre-filter a corpus of reference strings according to the string eligibility criterion, to produce a plurality of candidate strings; in response to selecting the candidate strings, perform a comparison between a frequency spectrum of the target signal and a frequency spectrum determined for a candidate string of the plurality of candidate strings; and determine whether the electronic communication is spam or non-spam according to a result of the comparison. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
-
27. A method comprising:
-
employing a computer system to receive an electronic communication; in response to receiving the electronic communication, employing the computer system to extract a target string from the electronic communication; employing the computer system to transmit the target string to an anti-spam server; and in response to transmitting the target string, receiving a target label indicative of whether the electronic communication is spam or non-spam, wherein the target label is determined at the anti-spam server and wherein determining the target label comprises; employing the anti-spam server to process the target string of characters into a target signal consisting of a sequence of numbers, wherein each character of the target string is mapped to a number of the sequence of numbers; employing the anti-spam server to determine an eligibility criterion according to the target string; employing the anti-spam server to pre-filter a corpus of reference strings according to the eligibility criterion, to produce a plurality of candidate strings; in response to selecting the candidate strings, employing the anti-spam server to employing the computer system to perform a comparison between a frequency spectrum of the target signal and a frequency spectrum determined for a candidate string of the plurality of candidate strings; and employing the anti-spam server to determine whether the electronic communication is spam or non-spam according to a result of the comparison.
-
-
28. A method comprising:
-
employing a computer system to receive a target string forming a part of an electronic communication; employing a computer system to process the target string of characters into a target signal consisting of a sequence of numbers, wherein each character of the target string is mapped to a number of the sequence of numbers; in response to receiving the target string, employing the computer system to determine a string eligibility criterion according to the target string; employing the computer system to pre-filter a corpus of reference strings according to the string eligibility criterion, to produce a plurality of candidate strings; in response to selecting the candidate strings, employing the computer system to determine an inter-string distance separating the target string from a candidate string of the plurality of candidate strings, the inter-string distance determined according to a first amplitude of a frequency spectrum of the target signal and according to a second amplitude of a frequency spectrum determined for the candidate string; and employing the computer system to determine whether the target communication is spam or non-spam according to the inter-string distance.
-
-
29. A method comprising:
-
employing a computer system to receive a target string forming a part of an electronic communication; employing a computer system to process the target string into a target signal consisting of a sequence of numbers, wherein each character of the target string is mapped to a number of the sequence of numbers; employing the computer system to determine a frequency spectrum of the target signal; employing the computer system to perform a comparison between the frequency spectrum of the target signal and a frequency spectrum determined for a reference string selected from a set of reference strings; and employing the computer system to determine whether the target communication is spam or non-spam according to a result of the comparison.
-
Specification