Systems and Methods for Providing a Spam Database and Identifying Spam Communications
First Claim
1. A computer-implemented method of identifying an incoming electronic communication as spam, the method comprising:
- accessing the incoming electronic communication from a memory device;
creating a first set of tokens from the incoming electronic communication;
accessing a second set of tokens, wherein the second set of tokens corresponds to an electronic communication stored in a spam database;
determining a degree of similarity based on a count of unique tokens appearing in both the first set of tokens and the second set of tokens; and
identifying the incoming electronic communication as spam if the degree of similarity exceeds a predetermined threshold.
8 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are provided for identifying unsolicited or unwanted electronic communications, such as spam. The disclosed embodiments also encompass systems and methods for selecting content items from a content item database. Consistent with certain embodiments, computer-implemented systems and methods may use a clustering based statistical content matching anti-spam algorithm to identify and filter spam. Such a anti-spam algorithm may be implemented to determine a degree of similarity between an incoming e-mail with a collection of one or more spam e-mails stored in a database. If the degree of similarity exceeds a predetermined threshold, the incoming e-mail may be classified as spam. Further, in accordance with other embodiments, systems and methods may be provided to determine a degree of similarity between a query or search string from a user and content items stored in a database. If the degree of similarity exceeds a predetermined threshold, the content item from the database may be identified as a content item that matches the query or search string provided by the user.
44 Citations
23 Claims
-
1. A computer-implemented method of identifying an incoming electronic communication as spam, the method comprising:
-
accessing the incoming electronic communication from a memory device; creating a first set of tokens from the incoming electronic communication; accessing a second set of tokens, wherein the second set of tokens corresponds to an electronic communication stored in a spam database; determining a degree of similarity based on a count of unique tokens appearing in both the first set of tokens and the second set of tokens; and identifying the incoming electronic communication as spam if the degree of similarity exceeds a predetermined threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-implemented system of identifying an incoming e-mail as a spam e-mail, the system comprising:
-
a spam database which stores a plurality of spam e-mails; a server which performs offline processing, the offline processing comprising; accessing a spam e-mail from the spam database; creating a first set of tokens from the spam e-mail; calculating a first total as a number of tokens in first set of tokens; and storing the first set of tokens and the first total; and a client which performs online processing, the online processing comprising; receiving the incoming e-mail; creating a second set of tokens from the incoming e-mail; calculating a second total as a number of tokens in the second set of tokens; accessing the first set of tokens and the first total corresponding to one of the plurality of spam e-mails in the spam database; determining a number of common tokens based on a minimum of a first count of each unique token in the first set of tokens and a second count of the each unique token in the second set of tokens; computing an easy signature as a ratio of the number of common tokens and the sum of the first total and the second total; and designating the incoming e-mail as spam when the easy signature exceeds a predetermined threshold. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A computer program product comprising executable instructions tangibly embodied in a non-transitory computer-readable medium which, when executed by a processor, perform a method of identifying an electronic communication as spam, the method comprising:
-
accessing the electronic communication from a memory device; creating a first set of tokens from the electronic communication; accessing a second set of tokens, corresponding to spam communication stored in a spam database; determining a degree of similarity based on a count of unique tokens appearing in both the first set of tokens and the second set of tokens; and identifying the electronic communication as spam if the degree of similarity exceeds a predetermined threshold. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23)
-
Specification