×

AUTOMATIC LEXICON GENERATION SYSTEM FOR DETECTION OF SUSPICIOUS E-MAILS FROM A MAIL ARCHIVE

  • US 20100057720A1
  • Filed: 03/16/2009
  • Published: 03/04/2010
  • Est. Priority Date: 08/26/2008
  • Status: Active Grant
First Claim
Patent Images

1. An automatic lexicon generation system to identify and construct a list of English phrases from a user specified set of example e-mails and documents written in English, said phrases being a set of relevant key phrases useful for identifying information leak in an archive of e-mails, said system comprises:

  • a) means (102) to identify a set of important key phrases from a user specified set of example e-mails leaking information and documents leaking information, written in English, using frequency analysis, word stemming, and removal of common words and domain specific words (FIG. 1, FIG. 4);

    b) means (102, 506, 511) to identify a set of important key phrases from a user specified set of example e-mails not leaking information and documents not leaking information, written in English, using frequency analysis, word stemming, and removal of common words (FIG. 1, FIG. 5);

    c) means (405, 511) to identify a set of relevant phrases and to assign a label, one of “

    very highly sensitive”

    , “

    highly sensitive”

    , “

    sensitive”

    , “

    not sensitive”

    or “

    sensitive”

    to each of the phrases of said set (FIG. 4, FIG. 5);

    (d) means (613) for assigning weights to each of said key phrases (FIG. 6);

    e) means (614) for building multiple key phrase lists and weights from said important key phrases (FIG. 6);

    f) means (615) for presenting said lists of key phrases to the user for simulation and for storing the final approved list as weighted category lexicon (FIG. 6)g) means (716, 717) for using said list of phrases on an archive of e-mails and documents written in English for identifying any e-mail leaking information (FIG. 7).

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×