×

Data quality assessment for vector machine learning

  • US 9,015,082 B1
  • Filed: 12/14/2011
  • Issued: 04/21/2015
  • Est. Priority Date: 12/14/2010
  • Status: Active Grant
First Claim
Patent Images

1. A method, implemented by a computing device, comprising:

  • receiving a training data set that comprises a plurality of sensitive documents and a plurality of non-sensitive documents;

    determining, by the computing device, a quality of the training data set, wherein determining the quality of the training data set comprises performing at least one of k-fold cross validation or latent semantic indexing using the training data set;

    in response to determining that the training data set has a satisfactory quality, analyzing, by the computing device, the training data set using machine learning to generate a machine learning-based detection (MLD) profile, the MLD profile to be used by a data loss prevention (DLP) system to classify new documents as sensitive documents or as non-sensitive documents; and

    in response to determining that the training data set does not have satisfactory quality, identifying at least one document from the training data set that caused the quality of the training data set to be reduced.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×