Phonetic Filtering of Undesired Email Messages
First Claim
Patent Images
1. A method comprising:
- training an email system for determining spam, where training includes at least the following;
tokenizing at least a portion of a first email message to create a token;
determining, from the token, a spam probability for the first email message;
in response to a determination that a spam probability from the token indicates that the first email message is likely spam, determining whether the generated token is present in a database of tokens, in response to a determination the generated token is not present in the database of tokens, assigning a probability value for the generated token as spam; and
in response to a determination that the spam probability from the generated token indicates that the first email message is not likely spam, determining whether the generated token is present in a database of tokens; and
filtering a second email message according to the training.
0 Assignments
0 Petitions
Accused Products
Abstract
Several embodiments, among others, provided in the present disclosure teach a filtering of email messages for spam based on phonetic equivalents of words found in the email message. In some embodiments, an email message having a word is received, and a phonetic equivalent of the word is generated. Thereafter, the phonetic equivalent of the word is tokenized to generate a token representative of the phonetic equivalent. The generated token is then used to determine a spam probability.
86 Citations
20 Claims
-
1. A method comprising:
-
training an email system for determining spam, where training includes at least the following; tokenizing at least a portion of a first email message to create a token; determining, from the token, a spam probability for the first email message; in response to a determination that a spam probability from the token indicates that the first email message is likely spam, determining whether the generated token is present in a database of tokens, in response to a determination the generated token is not present in the database of tokens, assigning a probability value for the generated token as spam; and in response to a determination that the spam probability from the generated token indicates that the first email message is not likely spam, determining whether the generated token is present in a database of tokens; and filtering a second email message according to the training. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A system comprising:
a memory that stores; first tokenize logic configured to tokenize a phonetic equivalent of a word in a received email message; second tokenize logic configured to tokenize an attachment of the received email message; spam-determination logic configured to determine a spam probability value from the generated tokens; and sorting logic configured to sort generated tokens in accordance with the corresponding determined spam probability value. - View Dependent Claims (10, 11, 12, 13, 14)
-
15. A computer-readable medium that includes a program that, when executed by a computer, causes the computer to perform at least the following:
-
generate a phonetic equivalent of word from a received email message; tokenize the phonetic equivalent of the word to create a token; determine a spam probability from the token; and sort the generated token in accordance with the corresponding determined spam probability value. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification