EMAIL ANALYSIS USING FUZZY MATCHING OF TEXT
First Claim
1. A method to determine a probability that an email message is spam, the method comprising:
- (a) receiving an email message;
(b) identifying one or more words and/or phrases of the email message that are likely being obfuscated;
(c) identifying one or more obfuscation techniques that are being used to obfuscate the one or more words and/or phrases that are identified as likely being obfuscated; and
(d) determining a probability that the email message is spam in dependence on at least both of the followingwhich particular one or more words and/or phrases are identified as likely being obfuscated, wherein identifying some particular words and/or phrases as likely being obfuscated increases the probability that the email message is spam more than identifying other particular words and/or phrases as likely being obfuscated, andwhich particular one or more obfuscation techniques are identified as being used to obfuscate the one or more words and/or phrases that are identified as likely being obfuscated, wherein identifying use of some obfuscation techniques increases the probability that the email message is spam more than identifying use of other obfuscation techniques;
wherein one or more of steps (b), (c) and (d) are performed using one or more processors.
0 Assignments
0 Petitions
Accused Products
Abstract
Translation of text or messages provides a message that is more reliably or efficiently analyzed for purposes as, for example, to detect spam in email messages. One translation process takes into account statistics of erroneous and intentional misspellings. Another process identifies and removes characters or character codes that do not generate visible symbols in a message displayed to a user. Another process detects symbols such as periods, commas, dashes, etc., interspersed in text such that the symbols do not unduly interfere with, or prevent, a user from perceiving a spam message. Another process can detect use of foreign language symbols and terms. Still other processes and techniques are presented to counter obfuscating spammer tactics and to provide for efficient and accurate analysis of message content. Groups of similar content items (e.g., words, phrases, images, ASCII text, etc.) are correlated and analysis can proceed after substitution of items in the group with other items in the group so that a more accurate detection of “sameness” of content can be achieved. Dictionaries are used for spam or ham words or phrases. Other features are described.
-
Citations
22 Claims
-
1. A method to determine a probability that an email message is spam, the method comprising:
-
(a) receiving an email message; (b) identifying one or more words and/or phrases of the email message that are likely being obfuscated; (c) identifying one or more obfuscation techniques that are being used to obfuscate the one or more words and/or phrases that are identified as likely being obfuscated; and (d) determining a probability that the email message is spam in dependence on at least both of the following which particular one or more words and/or phrases are identified as likely being obfuscated, wherein identifying some particular words and/or phrases as likely being obfuscated increases the probability that the email message is spam more than identifying other particular words and/or phrases as likely being obfuscated, and which particular one or more obfuscation techniques are identified as being used to obfuscate the one or more words and/or phrases that are identified as likely being obfuscated, wherein identifying use of some obfuscation techniques increases the probability that the email message is spam more than identifying use of other obfuscation techniques; wherein one or more of steps (b), (c) and (d) are performed using one or more processors. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system to determine a probability that an email message is spam, the system comprising:
-
one or more processors; machine-readable storage medium including instructions that are executable by the one or more processors; wherein the instructions include instructions to receive an email message; instructions to identify one or more words and/or phrases of the email message that are likely being obfuscated; and instructions to identify one or more obfuscation techniques that are being used to obfuscate the one or more words and/or phrases that are identified as likely being obfuscated; and instructions to determine a probability that the email message is spam in dependence on at least both of the following which particular one or more words and/or phrases are identified as likely being obfuscated, wherein identifying some particular words and/or phrases as likely being obfuscated increases the probability that the email message is spam more than identifying other particular words and/or phrases as likely being obfuscated, and which particular one or more obfuscation techniques are being used to obfuscate the one or more words and/or phrases that are identified as likely being obfuscated, wherein detecting use of some obfuscation techniques increases the probability that the email message is spam more than detecting use of other obfuscation techniques. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A machine-readable storage medium including instructions executable by one or more processors to determine a probability that an email message is spam, the machine-readable storage medium comprising:
-
instructions to receive an email message; instruction to identify one or more words and/or phrases of the email message that are likely being obfuscated; and instructions to determine a probability that the email message is spam in dependence on at least both of the following which particular one or more words and/or phrases are identified as likely being obfuscated, wherein identifying some particular words and/or phrases as likely being obfuscated increases the probability that the email message is spam more than identifying other particular words and/or phrases as likely being obfuscated, and which particular one or more obfuscation techniques are being used to obfuscate the one or more words and/or phrases that are identified as likely being obfuscated, wherein detecting use of some obfuscation techniques increases the probability that the email message is spam more than detecting use of other obfuscation techniques. - View Dependent Claims (16, 17, 18, 19, 20, 21)
-
-
22. A system to determine a probability that an email message is spam, the system comprising:
-
means for receiving an email message; means for identifying one or more words and/or phrases of the email message that are likely being obfuscated; and means for identifying one or more obfuscation techniques that are being used to obfuscate the one or more words and/or phrases that are identified as likely being obfuscated; and means for determining a probability that the email message is spam in dependence on at least both of the following which particular one or more words and/or phrases are identified as likely being obfuscated, wherein identifying some particular words and/or phrases as likely being obfuscated increases the probability that the email message is spam more than identifying other particular words and/or phrases as likely being obfuscated, and which particular one or more obfuscation techniques are being used to obfuscate the one or more words and/or phrases that are identified as likely being obfuscated, wherein detecting use of some obfuscation techniques increases the probability that the email message is spam more than detecting use of other obfuscation techniques.
-
Specification