×

System and method for spam filtering using shingles

  • US 8,996,638 B2
  • Filed: 11/01/2013
  • Issued: 03/31/2015
  • Est. Priority Date: 06/06/2013
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method for detecting spam, the method comprising:

  • receiving an electronic message;

    identifying in the received message one or more insignificant text portions based on a text pattern database storing a plurality of defined insignificant text patterns not containing spam, each defined insignificant text pattern comprising a text pattern, text identification information and a usage frequency;

    removing the one or more identified insignificant text portions from the message to generate an abridged message upon detecting the one or more identified insignificant text portions matching at least one of the plurality of defined insignificant text patterns;

    canonizing text of the abridge message;

    generating a set of shingles from the abridged and canonized message;

    identifying in generated set of shingles one or more shingles based on a shingles database storing a plurality of defined insignificant shingles that occur only in messages not containing spam, each defined insignificant shingle comprising a hash, a shingle pattern, text identification information corresponding to the shingle pattern, and a usage frequency;

    removing one or more identified shingles from the generated set of shingles to generate a reduced set of shingles upon detecting the one or more identified shingles matching at least one of the plurality of defined shingles; and

    performing spam filtering of the reduced set of shingles to determine whether the received message contains spam.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×