System and method for intelligent term grouping
First Claim
1. A method, comprising:
- capturing documents propagating as part of end user traffic in a network environment, wherein the documents are captured via a capture system configured to perform firewall activities;
identifying a root word for a tree used in managing data, wherein the root word is identified in at least one of the documents that was captured;
creating a word stem for inclusion in the tree, wherein a query is initiated to determine whether a stem node exists at one or more branch points of the word, and wherein if the stem node does not exist, then the stem node is added to a branch point of the tree;
generating a concept based on the tree, wherein the concept reflects a collection of terms that have a relationship with the root word, and wherein at least some of the documents that have at least some of the terms are tagged based on the relationship with the root word; and
applying the concept to a rule to control movement and access for at least some of the documents having been tagged as having particular content associated with the concept.
10 Assignments
0 Petitions
Accused Products
Abstract
A method is provided in one example embodiment and it includes identifying a root word for a tree to be used in managing data and creating a word stem to be included in the tree. A query is initiated to determine whether a stem node exists at one or more branch points of the word, and if the stem node does not exist, then the stem node is added to a branch point of the tree. In more specific embodiments, if the stem node does exist, then node statistics are updated. In other embodiments, the method includes updating a branch point list after creating the word stem. In yet other embodiments, the branch point is a word or a combination of words. The tree can be used to identify locations and frequencies within a document set where one or more words are present.
-
Citations
20 Claims
-
1. A method, comprising:
-
capturing documents propagating as part of end user traffic in a network environment, wherein the documents are captured via a capture system configured to perform firewall activities; identifying a root word for a tree used in managing data, wherein the root word is identified in at least one of the documents that was captured; creating a word stem for inclusion in the tree, wherein a query is initiated to determine whether a stem node exists at one or more branch points of the word, and wherein if the stem node does not exist, then the stem node is added to a branch point of the tree; generating a concept based on the tree, wherein the concept reflects a collection of terms that have a relationship with the root word, and wherein at least some of the documents that have at least some of the terms are tagged based on the relationship with the root word; and applying the concept to a rule to control movement and access for at least some of the documents having been tagged as having particular content associated with the concept. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. An apparatus, comprising:
a processor and a memory, wherein the apparatus is configured to; capture documents propagating as part of end user traffic in a network environment, wherein the documents are captured via a capture system configured to perform firewall activities; identify a root word for a tree used in managing data, wherein the root word is identified in at least one of the documents that was captured; create a word stem for inclusion in the tree, wherein a query is initiated to determine whether a stem node exists at one or more branch points of the word, and wherein if the stem node does not exist, then the stem node is added to a branch point of the tree; generate a concept based on the tree, wherein the concept reflects a collection of terms that have a relationship with the root word, and wherein at least some of the documents that have at least some of the terms are tagged based on the relationship with the root word; and apply the concept to a rule to control movement and access for at least some of the documents having been tagged as having particular content associated with the concept. - View Dependent Claims (10, 11, 12, 13, 14)
-
15. Logic encoded in one or more non-transitory tangible media for execution and when executed by a processor operable to:
-
capture documents propagating as part of end user traffic in a network environment, wherein the documents are captured via a capture system configured to perform firewall activities; identify a root word for a tree used in managing data, wherein the root word is identified in at least one of the documents that was captured; create a word stem for inclusion in the tree, wherein a query is initiated to determine whether a stem node exists at one or more branch points of the word, and wherein if the stem node does not exist, then the stem node is added to a branch point of the tree; generate a concept based on the tree, wherein the concept reflects a collection of terms that have a relationship with the root word, and wherein at least some of the documents that have at least some of the terms are tagged based on the relationship with the root word; and apply the concept to a rule to control movement and access for at least some of the documents having been tagged as having particular content associated with the concept. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification