Please download the dossier by clicking on the dossier button x
×

System and method for training data generation in predictive coding

  • US 9,607,272 B1
  • Filed: 03/15/2013
  • Issued: 03/28/2017
  • Est. Priority Date: 10/05/2012
  • Status: Active Grant
First Claim
Patent Images

1. A method comprising:

  • determining to improve an effectiveness measure of a first trained classification model, wherein the first trained model is trained using a set of training documents;

    selecting a plurality of unlabeled documents, wherein the plurality of unlabeled documents are not part of the set of training documents used to train the first trained classification model;

    generating a support vector based on a determination that one or more of the plurality of unlabeled documents are within a margin of a decision hyperplane associated with the first trained classification model;

    calculating, by a processor in a predictive coding system, an overall score for each unlabeled document of the plurality of unlabeled documents based on a distance of a respective unlabeled document to the decision hyperplane and an angle diversity of the respective unlabeled document;

    comparing, by the processor in the predictive coding system, the overall scores of the unlabeled documents to each other to select a pre-determined number of unlabeled documents having lowest scores in the plurality of unlabeled documents;

    updating, by the processor in the predictive coding system, the set of training documents used to train the first trained classification model by adding the pre-determined number of unlabeled documents having the lowest scores in the plurality of unlabeled documents to the set of training documents;

    updating the decision hyperplane based on the support vector;

    providing, by the predictive coding system, the updated set of training documents to the first trained classification model to improve the effectiveness measure of the first trained classification model by generating a second trained classification model from the updated set of training documents;

    identifying an effectiveness measure of the second trained classification model; and

    generating a third trained classification model based on a determination that the effectiveness measure of the second trained classification model has improved from the effectiveness measure of the first trained classification model.

View all claims
  • 7 Assignments
Timeline View
Assignment View
    ×
    ×