Systems and methods for generating machine learning-based classifiers for detecting specific categories of sensitive information
First Claim
1. A computer-implemented method for generating machine learning-based classifiers for detecting specific categories of sensitive information, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising:
- identifying a plurality of specific categories of sensitive information to be protected by a data loss prevention (DLP) system;
obtaining a training data set customized for each specific category of sensitive information that comprises a plurality of positive examples of data that fall within the specific category of sensitive information and a plurality of negative examples of data that do not fall within the specific category of sensitive information;
using machine learning to train, based on an analysis of the training data sets, at least one machine learning-based classifier to detect items of data that contain one or more of the plurality of specific categories of sensitive information;
deploying the machine learning-based classifier within the DLP system to enable the DLP system to detect and protect, using the machine learning-based classifier, items of data that contain one or more of the plurality of specific categories of sensitive information by performing at least one DLP action specified by at least one DLP policy of the DLP system, wherein the DLP action is selected based at least in part on whether the item of data comprises a percentage of one or more of the plurality of specific categories of sensitive information that exceeds a predetermined percentage threshold.
2 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented method may include (1) identifying a plurality of specific categories of sensitive information to be protected by a DLP system, (2) obtaining a training data set for each specific category of sensitive information that includes a plurality of positive and a plurality of negative examples of the specific category of sensitive information, (3) using machine learning to train, based on an analysis of the training data sets, at least one machine learning-based classifier that is capable of detecting items of data that contain one or more of the plurality of specific categories of sensitive information, and then (4) deploying the machine learning-based classifier within the DLP system to enable the DLP system to detect and protect items of data that contain one or more of the plurality of specific categories of sensitive information in accordance with at least one DLP policy of the DLP system.
-
Citations
20 Claims
-
1. A computer-implemented method for generating machine learning-based classifiers for detecting specific categories of sensitive information, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising:
-
identifying a plurality of specific categories of sensitive information to be protected by a data loss prevention (DLP) system; obtaining a training data set customized for each specific category of sensitive information that comprises a plurality of positive examples of data that fall within the specific category of sensitive information and a plurality of negative examples of data that do not fall within the specific category of sensitive information; using machine learning to train, based on an analysis of the training data sets, at least one machine learning-based classifier to detect items of data that contain one or more of the plurality of specific categories of sensitive information; deploying the machine learning-based classifier within the DLP system to enable the DLP system to detect and protect, using the machine learning-based classifier, items of data that contain one or more of the plurality of specific categories of sensitive information by performing at least one DLP action specified by at least one DLP policy of the DLP system, wherein the DLP action is selected based at least in part on whether the item of data comprises a percentage of one or more of the plurality of specific categories of sensitive information that exceeds a predetermined percentage threshold. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
-
-
12. A system for generating machine learning-based classifiers for use in detecting specific categories of sensitive information, the system comprising:
-
an identification module programmed to identify a plurality of specific categories of sensitive information to be protected by a data loss prevention (DLP) system; a training module programmed to; obtain a training data set customized for each specific category of sensitive information that comprises a plurality of positive examples of data that fall within the specific category of sensitive information and a plurality of negative examples of data that do not fall within the specific category of sensitive information; use machine learning to train, based on an analysis of the training data sets, at least one machine learning-based classifier to detect items of data that contain one or more of the plurality of specific categories of sensitive information; a deployment module programmed to deploy the machine learning-based classifier within the DLP system to enable the DLP system to detect and protect, using the machine learning-based classifier, items of data that contain one or more of the plurality of specific categories of sensitive information by performing at least one DLP action specified by at least one DLP policy of the DLP system, wherein the DLP action is selected based at least in part on whether the item of data comprises a percentage of one or more of the plurality of specific categories of sensitive information that exceeds a predetermined threshold; at least one hardware processor configured to execute at least one of the identification module, the training module, and the deployment module. - View Dependent Claims (13, 14, 15, 16, 17, 18, 19)
-
-
20. A non-transitory computer-readable-storage medium comprising one or more computer-executable instructions that, when executed by at least one processor of a computing device, cause the computing device to:
-
identify a plurality of specific categories of sensitive information to be protected by a data loss prevention (DLP) system; obtain a training data set customized for each specific category of sensitive information that comprises a plurality of positive examples of data that fall within the specific category of sensitive information and a plurality of negative examples of data that do not fall within the specific category of sensitive information; use machine learning to train, based on an analysis of the training data sets, at least one machine learning-based classifier to detect items of data that contain one or more of the plurality of specific categories of sensitive information; deploy the machine learning-based classifier within the DLP system to enable the DLP system to detect and protect, using the machine learning-based classifier, items of data that contain one or more of the plurality of specific categories of sensitive information by performing at least one DLP action specified by at least one policy of the DLP system, wherein the DLP action is selected based at least in part on whether the item of data comprises a specific percentage of one or more of the plurality of specific categories of sensitive information that exceeds a predetermined percentage threshold.
-
Specification