×

System and method for generating vocabulary from network data

  • US 8,489,390 B2
  • Filed: 09/30/2009
  • Issued: 07/16/2013
  • Est. Priority Date: 09/30/2009
  • Status: Active Grant
First Claim
Patent Images

1. A method, comprising:

  • receiving data propagating in a network environment;

    sorting the data into a first group and a second group, wherein the first group includes Joint Photographic Experts Group (JPEG) compressed data, and wherein the first group is ignored;

    separating the data in the second group into one or more fields;

    evaluating, using a processor, at least some of the fields in order to identify nouns and noun phrases within the fields;

    identifying selected words within the nouns and noun phrases based on a whitelist and a blacklist, wherein the whitelist includes a plurality of designated words to be tagged and the blacklist includes a plurality of rejected words that are not to be tagged;

    dropping the data if certain words in the data are included in the blacklist;

    generating a first resultant composite of selected nouns and noun phrases that are tagged;

    identifying selected words within the first resultant composite of selected nouns and noun phrases based on a list of administrator stop words;

    removing the identified selected words to create a second resultant composite of selected nouns and noun phrases;

    presenting the second resultant composite of selected nouns and noun phrases to an administrator; and

    incorporating selected nouns and noun phrases from the second resultant composite into the whitelist if the selected nouns and noun phrases are approved by the administrator.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×