Confidence level threshold selection assistance for a data loss prevention system using machine learning
First Claim
1. A method comprising:
- training, by a processing device, a machine learning-based detection (MLD) profile, wherein the MLD profile is used to classify new data as sensitive data or as non-sensitive data; and
providing a confidence level threshold for the MLD profile to a user, wherein the confidence level threshold is used as a boundary between the sensitive data and the non-sensitive data, wherein providing the confidence level threshold comprises setting a default value of a confidence level threshold user interface element to the confidence level threshold, and wherein the confidence level threshold user interface element allows the user to change the confidence level threshold,wherein the MLD profile is used to classify new data by using the MLD profile to assign a confidence value to the new data and classifying the new data as the sensitive data in response to determining that the confidence value of the new data is above the confidence level threshold, wherein the confidence value is from a range of confidence values, and wherein a higher value in the range of confidence values indicates a higher likelihood of being the sensitive data than a lower value in the range of confidence values.
2 Assignments
0 Petitions
Accused Products
Abstract
Machine-learning based detection (MLD) profiles can be used to identify sensitive information in documents. The MLD profile can be used to generate a confidence value for the document that expresses the degree of confidence with which the MLD profile can classify the document as sensitive or not. In one embodiment, a data loss prevention system provides or suggests a confidence level threshold to a user of the data loss prevention system by providing a confidence level threshold for the MLD profile to the user, the confidence level threshold to be used as the boundary between sensitive data and non-sensitive data. In one embodiment the provided confidence level threshold is determined by scanning a random data set using the MLD profile.
-
Citations
20 Claims
-
1. A method comprising:
-
training, by a processing device, a machine learning-based detection (MLD) profile, wherein the MLD profile is used to classify new data as sensitive data or as non-sensitive data; and providing a confidence level threshold for the MLD profile to a user, wherein the confidence level threshold is used as a boundary between the sensitive data and the non-sensitive data, wherein providing the confidence level threshold comprises setting a default value of a confidence level threshold user interface element to the confidence level threshold, and wherein the confidence level threshold user interface element allows the user to change the confidence level threshold, wherein the MLD profile is used to classify new data by using the MLD profile to assign a confidence value to the new data and classifying the new data as the sensitive data in response to determining that the confidence value of the new data is above the confidence level threshold, wherein the confidence value is from a range of confidence values, and wherein a higher value in the range of confidence values indicates a higher likelihood of being the sensitive data than a lower value in the range of confidence values. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A non-transitory computer-readable storage medium having instructions stored therein that, when executed by a processing device, cause the processing device to perform operations comprising:
-
training, by the processing device, a machine learning-based detection (MLD) profile, wherein the MLD profile is used to classify new data as sensitive data or as non-sensitive data; scanning a random data set using the MLD profile to determine a confidence level threshold; and providing the confidence level threshold for the MLD profile to a user, wherein the confidence level threshold is used as a boundary between the sensitive data and the non-sensitive data, wherein the MLD profile is used to classify new data by using the MLD profile to assign a confidence value to the new data and classifying the new data as the sensitive data in response to determining that the confidence value of the new data is above the confidence level threshold, wherein the confidence value is from a range of confidence values, and wherein a higher value in the range of confidence values indicates a higher likelihood of being the sensitive data than a lower value in the range of confidence values. - View Dependent Claims (11, 12, 13, 14, 15)
-
-
16. A system comprising:
-
a memory to store instructions for a machine learning manager; and a processing device to execute the instructions to; train a machine learning-based detection (MLD) profile, wherein the MLD profile is used to classify new data as sensitive data or as non-sensitive data; scan a random data set using the MLD profile to determine a confidence level threshold; and provide the confidence level threshold for the MLD profile to a user, wherein the confidence level threshold is used as a boundary between the sensitive data and the non-sensitive data, wherein the MLD profile is used to classify new data by using the MLD profile to assign a confidence value to the new data and classifying the new data as the sensitive data in response to determining that the confidence value of the new data is above the confidence level threshold, wherein the confidence value is from a range of confidence values, and wherein a higher value in the range of confidence values indicates a higher likelihood of being the sensitive data than a lower value in the range of confidence values. - View Dependent Claims (17, 18, 19, 20)
-
Specification