Incremental machine learning for data loss prevention
First Claim
Patent Images
1. A method comprising:
- receiving a plurality of first documents that were incorrectly classified as sensitive data based on a machine learning-based detection (MLD) profile;
modifying a training data set that was used to generate the MLD profile by adding the first documents to the training data set as negative examples of sensitive data to generate a modified training data set;
determining that there are at least a threshold number of the first documents; and
analyzing, by a processing device, the modified training data set using machine learning to generate an updated MLD profile in response to determining that there are at least the threshold number of the first documents.
2 Assignments
0 Petitions
Accused Products
Abstract
A computing device receives a document that was incorrectly classified as sensitive data based on a machine learning-based detection (MLD) profile. The computing device modifies a training data set that was used to generate the MLD profile by adding the document to the training data set as a negative example of sensitive data to generate a modified training data set. The computing device then analyzes the modified training data set using machine learning to generate an updated MLD profile.
39 Citations
18 Claims
-
1. A method comprising:
-
receiving a plurality of first documents that were incorrectly classified as sensitive data based on a machine learning-based detection (MLD) profile; modifying a training data set that was used to generate the MLD profile by adding the first documents to the training data set as negative examples of sensitive data to generate a modified training data set; determining that there are at least a threshold number of the first documents; and analyzing, by a processing device, the modified training data set using machine learning to generate an updated MLD profile in response to determining that there are at least the threshold number of the first documents. - View Dependent Claims (2, 3, 4, 5, 6)
-
-
7. A non-transitory computer-readable storage medium having instructions stored therein that, when executed by a processing device, cause the processing device to perform operations comprising:
-
receiving a plurality of first documents that were incorrectly classified as sensitive data based on a machine learning-based detection (MLD) profile; modifying a training data set that was used to generate the MLD profile by adding the first documents to the training data set as negative examples of sensitive data to generate a modified training data set; determining that there are at least a threshold number of the first documents; and analyzing, by the processing device, the modified training data set using machine learning to generate an updated MLD profile in response to determining that there are at least the threshold number of the first documents. - View Dependent Claims (8, 9, 10, 11, 12)
-
-
13. A system comprising:
-
a memory to store instructions; and a processing device, coupled to the memory, to execute the instructions to; receive a plurality of first documents that were incorrectly classified as sensitive data based on a machine learning-based detection (MLD) profile; modify a training data set that was used to generate the MLD profile by adding the first documents to the training data set as negative examples of sensitive data to generate a modified training data set; determine that there are at least a threshold number of the first documents and analyze the modified training data set using machine learning to generate an updated MLD profile in response to the determination that there are at least the threshold number of the first documents. - View Dependent Claims (14, 15, 16, 17, 18)
-
Specification