Secure information classification
First Claim
1. A method for using a system to manage documents sensitive or classified content with a predetermined classifier threshold, comprising:
- (a) extracting, from a security policy guide or other informal set of rules, a list of text features;
(b) enabling interaction with a user configuring the system to create a rule-based classifier based on the list of text features and one or more synonymous features that capture sensitive or classified information in the security policy guide or the other informal set of rules;
(c) applying the rule-based classifier to one or more selected documents to tag a set of documents with the sensitive or classified information they contain to generate tagged documents;
(d) training a statistical text classifier using the tagged documents generated in (c) as a training set;
(e) applying the statistical text classifier to the training set to suggest additional documents that should be tagged and to generate additional text features for detecting the sensitive or classified information;
(f) providing the additional documents and the additional text features to a user interface for review and comparison by the user to update the training set and the list of text features and the one or more synonymous features;
(g) refining the rule-based classifier based on the training set, the list of text features, and the one or more synonymous features generated in (f); and
(h) repeating operations (b) through (g) until a classification scheme satisfies the predetermined classifier threshold.
1 Assignment
0 Petitions
Accused Products
Abstract
In one embodiment a method to create a system to manage documents with sensitive or classified content comprises extracting a list of text features enabling interaction with the user developing the system to create a rule-based classifier based on the list of text features and one or more synonymous features, applying the rule-based classifier to one or more selected documents to tag a set of documents with the sensitive or classified information they contain, training a statistical text classifier using the tagged documents generated as a training set, applying the trained statistical text classifier to the training set, and reapplying the refined rule-based classifier to the one or more documents to tag a set of documents with the sensitive or classified information they contain. Other embodiments may be described.
-
Citations
12 Claims
-
1. A method for using a system to manage documents sensitive or classified content with a predetermined classifier threshold, comprising:
-
(a) extracting, from a security policy guide or other informal set of rules, a list of text features; (b) enabling interaction with a user configuring the system to create a rule-based classifier based on the list of text features and one or more synonymous features that capture sensitive or classified information in the security policy guide or the other informal set of rules; (c) applying the rule-based classifier to one or more selected documents to tag a set of documents with the sensitive or classified information they contain to generate tagged documents; (d) training a statistical text classifier using the tagged documents generated in (c) as a training set; (e) applying the statistical text classifier to the training set to suggest additional documents that should be tagged and to generate additional text features for detecting the sensitive or classified information; (f) providing the additional documents and the additional text features to a user interface for review and comparison by the user to update the training set and the list of text features and the one or more synonymous features; (g) refining the rule-based classifier based on the training set, the list of text features, and the one or more synonymous features generated in (f); and (h) repeating operations (b) through (g) until a classification scheme satisfies the predetermined classifier threshold. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A computer-based system for using the system to manage document classification with a predetermined classifier threshold, the system comprising:
-
a non-transitory memory module; a computer-based processing device coupled to memory; and logic instruction stored in the non-transitory memory module which, when executed by the processing device, configures the processing device to; (a) extract, from a security policy guide or other informal set of rules, a list of text features; (b) enable interaction with a user to configure the system to create a rule-based classifier based on the list of text features and one or more synonymous features that capture sensitive or classified information in the security policy guide or the other informal set of rules; (c) apply the rule-based classifier to one or more selected documents to tag a set of documents with the sensitive or classified information they contain to generate tagged documents; (d) train a statistical text classifier using the tagged documents generated in (c) as a training set; (e) apply the statistical text classifier to the training set to suggest additional documents that should be tagged and to generate additional text features for detecting the sensitive or classified information; (f) provide the additional documents and the additional text features to a user interface for review and comparison by the user to update the training set and the list of text features and the one or more synonymous features; (g) refine the rule-based classifier based on the training set and the list of text features and the one or more synonymous features generated in (f); and (h) repeat operations (b) through (g) until a classification scheme satisfies the predetermined classifier threshold. - View Dependent Claims (8, 9, 10)
-
-
11. A computer program product comprising logic instruction stored in a non-transitory memory module which, when executed by a processing device, configures the processing device to manage document classification in a document classification system with a predetermined classifier threshold by performing operations comprising:
-
(a) extracting, from a security policy guide or other informal set of rules, a list of text features; (b) enabling interaction with a user to configure the document classification system to create a rule-based classifier based on the list of text features and one or more synonymous features that capture sensitive or classified information in the security policy guide or the other informal set of rules; (c) applying the rule-based classifier to one or more selected documents to tag a set of documents with the sensitive or classified information they contain to generate tagged documents; (d) training a statistical text classifier using the tagged documents generated in (c) as a training set; (e) applying the statistical text classifier to the training set to suggest additional documents that should be tagged and to generate additional text features for detecting the sensitive or classified information; (f) providing the additional documents and the additional text features to a user interface for review and comparison by the user to update the training set and the list of text features and the one or more synonymous features; (g) refining the rule-based classifier based on the training set and the list of text features and the one or more synonymous features generated in (f); and (h) repeating operations (b) through (g) until a classification scheme satisfies the predetermined classifier threshold. - View Dependent Claims (12)
-
Specification