System and method for spam filtering using insignificant shingles
First Claim
1. A computer-implemented method for detecting spam in a message, the method comprising:
- identifying in a received message one or more insignificant text portions based on a text pattern database storing defined insignificant text patterns not containing spam;
removing at least a portion of the one or more identified insignificant text portions from the message to generate an abridged and canonized message;
generating a set of shingles from the abridged and canonized message;
identifying in the set of shingles one or more shingles based on a shingles database storing defined insignificant shingles that occur only in messages not containing spam;
removing one or more identified shingles from the set of shingles to generate a reduced set of shingles upon detecting the one or more identified shingles matching at least one of the defined insignificant shingles; and
determining whether the received message contains spam based on the reduced set of shingles.
1 Assignment
0 Petitions
Accused Products
Abstract
Disclosed system and methods for detecting spam using shingles. An example system identifies in a received message one or more insignificant text portions based on a text pattern database storing defined insignificant text patterns not containing spam; removes at least a portion of the one or more identified insignificant text portions from the message to generate an abridged and canonized message; generates a set of shingles from the abridged and canonized message; identifies in the set of shingles one or more shingles based on a shingles database storing defined insignificant shingles that occur only in messages not containing spam; removes one or more identified shingles from the set of shingles to generate a reduced set of shingles upon detecting the one or more identified shingles matching at least one of the defined insignificant shingles; and determines whether the received message contains spam based on the reduced set of shingles.
-
Citations
20 Claims
-
1. A computer-implemented method for detecting spam in a message, the method comprising:
-
identifying in a received message one or more insignificant text portions based on a text pattern database storing defined insignificant text patterns not containing spam; removing at least a portion of the one or more identified insignificant text portions from the message to generate an abridged and canonized message; generating a set of shingles from the abridged and canonized message; identifying in the set of shingles one or more shingles based on a shingles database storing defined insignificant shingles that occur only in messages not containing spam; removing one or more identified shingles from the set of shingles to generate a reduced set of shingles upon detecting the one or more identified shingles matching at least one of the defined insignificant shingles; and determining whether the received message contains spam based on the reduced set of shingles. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer system for detecting spam, the system comprising:
a processor configured to; identify in a received message one or more insignificant text portions based on a text pattern database storing defined insignificant text patterns not containing spam; remove at least a portion of the one or more identified insignificant text portions from the message to generate an abridged and canonized message; generate a set of shingles from the abridged and canonized message; identify in the set of shingles one or more shingles based on a shingles database storing defined insignificant shingles that occur only in messages not containing spam; remove one or more identified shingles from the set of shingles to generate a reduced set of shingles upon detecting the one or more identified shingles matching at least one of the defined insignificant shingles; and determine whether the received message contains spam based on the reduced set of shingles. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
15. A computer program product stored on a non-transitory computer-readable storage medium, the computer program product comprising computer-executable instructions for detecting spam, including instructions for:
-
identifying in a received message one or more insignificant text portions based on a text pattern database storing defined insignificant text patterns not containing spam; removing at least a portion of the one or more identified insignificant text portions from the message to generate an abridged and canonized message; generating a set of shingles from the abridged and canonized message; identifying in the set of shingles one or more shingles based on a shingles database storing defined insignificant shingles that occur only in messages not containing spam; removing one or more identified shingles from the set of shingles to generate a reduced set of shingles upon detecting the one or more identified shingles matching at least one of the defined insignificant shingles; and determining whether the received message contains spam based on the reduced set of shingles. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification