Computer-implemented system and method for generating a training set for use during document review
First Claim
1. A computer-implemented method for generating a training set for use during document review, comprising:
- assigning classification codes to a set of documents;
receiving further classification codes assigned to the same set of documents;
comparing the classification code for at least one document with the further classification code for that document;
determining whether a disagreement exists between the assigned classification code and the further classification code for at least one document;
identifying those documents with disagreeing classification codes as training set candidates;
applying a stop threshold to the training set candidates, wherein the stop threshold comprises one of a percentage of disagreement, a number of documents with disagreeing classifications, and a zero-defect test; and
designating the training set candidates as a training set when the stop threshold is satisfied.
4 Assignments
0 Petitions
Accused Products
Abstract
A computer-implemented system and method for generating a training set for use during document review is provided. Classification codes are assigned to a set of documents. Further classification codes are assigned to the same set of documents. The classification code for at least one document is compared with the further classification code for that document. A determination regarding whether a disagreement exists between the assigned classification code and the further classification code for at least one document is made. Those documents with disagreeing classification codes are identified as training set candidates. A stop threshold is applied to the training set candidates and the training set candidates are grouped as a training set when the stop threshold is satisfied.
-
Citations
18 Claims
-
1. A computer-implemented method for generating a training set for use during document review, comprising:
-
assigning classification codes to a set of documents; receiving further classification codes assigned to the same set of documents; comparing the classification code for at least one document with the further classification code for that document; determining whether a disagreement exists between the assigned classification code and the further classification code for at least one document; identifying those documents with disagreeing classification codes as training set candidates; applying a stop threshold to the training set candidates, wherein the stop threshold comprises one of a percentage of disagreement, a number of documents with disagreeing classifications, and a zero-defect test; and designating the training set candidates as a training set when the stop threshold is satisfied. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A computer-implemented system for generating a training set for use during document review, comprising:
-
an assignment module to assign classification codes to a set of documents; a classification receipt module to receive further classification codes assigned to the same set of documents; a comparison module to compare the classification code for at least one document with the further classification code for that document; a determination module to determine whether a disagreement exists between the assigned classification code and the further classification code for at least one document; an identification module to identify those documents with disagreeing classification codes as training set candidates; a training set module to apply a stop threshold to the training set candidates and to designate the training set candidates as a training set when the stop threshold is satisfied, wherein the stop threshold comprises one of a percentage of disagreement, a number of documents with disagreeing classifications, and a zero-defect test; and a processor to execute the modules. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
Specification