System and method for generating vocabulary from network data
First Claim
1. A method, comprising:
- receiving data propagating in a network environment;
sorting the data into a first group and a second group, wherein the first group includes Joint Photographic Experts Group (JPEG) compressed data, and wherein the first group is ignored;
separating the data in the second group into one or more fields;
evaluating, using a processor, at least some of the fields in order to identify nouns and noun phrases within the fields;
identifying selected words within the nouns and noun phrases based on a whitelist and a blacklist, wherein the whitelist includes a plurality of designated words to be tagged and the blacklist includes a plurality of rejected words that are not to be tagged;
dropping the data if certain words in the data are included in the blacklist;
generating a first resultant composite of selected nouns and noun phrases that are tagged;
identifying selected words within the first resultant composite of selected nouns and noun phrases based on a list of administrator stop words;
removing the identified selected words to create a second resultant composite of selected nouns and noun phrases;
presenting the second resultant composite of selected nouns and noun phrases to an administrator; and
incorporating selected nouns and noun phrases from the second resultant composite into the whitelist if the selected nouns and noun phrases are approved by the administrator.
1 Assignment
0 Petitions
Accused Products
Abstract
A method is provided in one example and includes receiving data propagating in a network environment and separating the data into one or more fields. At least some of the fields are evaluated in order to identify nouns and noun phrases within the fields. The method also includes identifying selected words within the nouns and noun phrases based on a whitelist and a blacklist. The whitelist includes a plurality of designated words to be tagged and the blacklist includes a plurality of rejected words that are not to be tagged. A resultant composite is generated for the selected nouns and noun phrases that are tagged. The resultant composite is incorporated into the whitelist if the resultant composite is approved.
-
Citations
29 Claims
-
1. A method, comprising:
-
receiving data propagating in a network environment; sorting the data into a first group and a second group, wherein the first group includes Joint Photographic Experts Group (JPEG) compressed data, and wherein the first group is ignored; separating the data in the second group into one or more fields; evaluating, using a processor, at least some of the fields in order to identify nouns and noun phrases within the fields; identifying selected words within the nouns and noun phrases based on a whitelist and a blacklist, wherein the whitelist includes a plurality of designated words to be tagged and the blacklist includes a plurality of rejected words that are not to be tagged; dropping the data if certain words in the data are included in the blacklist; generating a first resultant composite of selected nouns and noun phrases that are tagged; identifying selected words within the first resultant composite of selected nouns and noun phrases based on a list of administrator stop words; removing the identified selected words to create a second resultant composite of selected nouns and noun phrases; presenting the second resultant composite of selected nouns and noun phrases to an administrator; and incorporating selected nouns and noun phrases from the second resultant composite into the whitelist if the selected nouns and noun phrases are approved by the administrator. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. Logic encoded in one or more non-transitory media that includes code for execution and when executed by a processor is operable to perform operations comprising:
-
receiving data propagating in a network environment; sorting the data into a first group and a second group, wherein the first group includes Joint Photographic Experts Group (JPEG) compressed data, and wherein the first group is ignored; separating the data in the second group into one or more fields; evaluating at least some of the fields in order to identify nouns and noun phrases within the fields; identifying selected words within the nouns and noun phrases based on a whitelist and a blacklist, wherein the whitelist includes a plurality of designated words to be tagged and the blacklist includes a plurality of rejected words that are not to be tagged; dropping the data if certain words in the data are included in the blacklist; generating a first resultant composite of selected nouns and noun phrases that are tagged; identifying selected words within the first resultant composite of selected nouns and noun phrases based on a list of administrator stop words; removing the identified selected words to create a second resultant composite of selected nouns and noun phrases; presenting the second resultant composite of selected nouns and noun phrases to an administrator; and incorporating selected nouns and noun phrases from the second resultant composite into the whitelist if the selected nouns and noun phrases are approved by the administrator. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. An apparatus, comprising:
-
a memory element; a processor operable to execute instructions; and a noun phrase extractor module configured to interface with the memory element and the processor, the noun phrase extractor module being configured to; receive data propagating in a network environment; sort the data into a first group and a second group, wherein the first group includes Joint Photographic Experts Group (JPEG) compressed data, and wherein the first group is ignored; separate the data in the second group into one or more fields; evaluate at least some of the fields in order to identify nouns and noun phrases within the fields; identify selected words within the nouns and noun phrases based on a whitelist and a blacklist, wherein the whitelist includes a plurality of designated words to be tagged and the blacklist includes a plurality of rejected words that are not to be tagged; drop the data if certain words in the data are included in the blacklist; generate a first resultant composite of selected nouns and noun phrases that are tagged; identify selected words within the first resultant composite of selected nouns and noun phrases based on a list of administrator stop words; remove the identified selected words to create a second resultant composite of selected nouns and noun phrases; present the second resultant composite of selected nouns and noun phrases to an administrator; and incorporate selected nouns and noun phrases from the second resultant composite into the whitelist if the selected nouns and noun phrases are approved by the administrator. - View Dependent Claims (16, 17, 18, 19, 20)
-
-
21. A system, comprising:
a network element that includes a memory element and a processor operable to execute instructions, wherein the network element is configured to; receive data propagating in a network environment; sort the data into a first group and a second group, wherein the first group includes Joint Photographic Experts Group (JPEG) compressed data, and wherein the first group is ignored; separate the data in the second group into one or more fields; evaluate at least some of the fields in order to identify nouns and noun phrases within the fields; identify selected words within the nouns and noun phrases based on a whitelist and a blacklist, wherein the whitelist includes a plurality of designated words to be tagged and the blacklist includes a plurality of rejected words that are not to be tagged; drop the data if certain words in the data are included in the blacklist; generate a first resultant composite of selected nouns and noun phrases that are tagged; identify selected words within the first resultant composite of selected nouns and noun phrases based on a list of administrator stop words; remove the identified selected words to create a second resultant composite of selected nouns and noun phrases; present the second resultant composite of selected nouns and noun phrases to an administrator; incorporate selected nouns and noun phrases from the second resultant composite into the whitelist if the selected nouns and noun phrases are approved by the administrator; and maintain a repository that includes the second resultant composite, wherein the repository is configured to receive one or more search queries associated with designated subject areas. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29)
Specification