×

Linguistic nonsense detection for undesirable message classification

  • US 7,809,795 B1
  • Filed: 09/26/2006
  • Issued: 10/05/2010
  • Est. Priority Date: 09/26/2006
  • Status: Expired due to Fees
First Claim
Patent Images

1. A computer implemented method for identifying undesirable electronic messages by a computer, the method comprising the steps of:

  • identifying, by the computer, incoming electronic messages;

    normalizing, by the computer, identified electronic messages according to a plurality of rules for distinguishing non-legitimate words from legitimate words, said normalizing further comprising identifying non-legitimate words obfuscating electronic messages according to the plurality of rules, and deleting the identified non-legitimate words from the electronic messages;

    wherein said plurality of rules for distinguishing non-legitimate words from legitimate words comprises at least three rules from a group of rules consisting of;

    a rule specifying a maximum number of consecutive vowels in a legitimate word;

    a rule specifying a maximum number of consecutive consonants in a legitimate word;

    a rule specifying a maximum number of consecutive uses of any single character in a legitimate word;

    a rule specifying a maximum number of transitions between upper case letters and lower case letters in a legitimate word;

    a rule specifying a maximum length of a legitimate word containing numbers without punctuation;

    a rule specifying a maximum length of a legitimate word containing upper case letters, lower case letters and numbers;

    a rule specifying a maximum length of a legitimate word containing upper case letters, lower case letters, numbers and punctuation;

    a rule specifying a minimum number of vowels in a legitimate word;

    a rule specifying a minimum number of consonants in a legitimate word;

    a rule specifying a minimum ratio of vowels to consonants in a legitimate word; and

    a rule specifying a maximum ratio of vowels to consonants in a legitimate word; and

    analyzing, by the computer, normalized electronic message to identify undesirable electronic messages.

View all claims
  • 5 Assignments
Timeline View
Assignment View
    ×
    ×