Systems and methods for conducting a highly autonomous technology-assisted review classification
First Claim
1. A system for classifying information, the system comprising:
- at least one computing device having a processor and physical memory, the physical memory storing instructions that cause the processor to;
receive an identification of a relevant document;
select a first set of documents from a document collection, wherein the document collection is stored on a non-transitory storage medium;
assign a first set of default classifications to documents in the first set of documents to be used as a training set along with the relevant document;
train a classifier using the training set;
score one or more documents in the document collection using the classifier;
upon determining that a stopping criteria has been reached, classify one or more documents in the document collection using the classifier;
upon determining that a stopping criteria has not been reached, select a second set of documents having a batch size for presenting to a reviewer for review prior to repeating the step of training the classifier;
present one or more documents in the second set of documents to the reviewer;
receive from the reviewer user coding decisions associated with the presented documents;
add one or more of the documents presented to the reviewer for which user coding decisions were received to the training set;
remove one or more documents in the first set of documents from the training set;
add a third set of documents from the document collection to the training set;
assign a second set of default classifications to one or more documents in the third set of documents;
update the classifier using one or more documents in the training set;
increase the batch size of documents selected for the second set of documents; and
repeat the steps of training, scoring and determining whether a stopping criteria has been reached;
wherein the first and second set of default classifications are presumptively assigned classifications used for the purpose of training the classifier in order to form a decision boundary, the presumptively assigned classifications not being based on a review; and
wherein the one or more documents in the first set of documents removed from the training set are documents previously assigned a presumptively assigned classification.
0 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods for classifying electronic information are provided by way of a Technology-Assisted Review (“TAR”) process, specifically an “Auto-TAR” process that limits discretionary choices in an information classification effort, while still achieving superior results. In certain embodiments, Auto-TAR selects an initial relevant document from a document collection, selects a number of other documents from the document collection and assigns them a default classification, trains a classifier using a training set made up of the selected relevant document and the documents assigned a default classification, scores documents in the document collection and determines if a stopping criteria is met. If a stopping criteria has not been met, the process sorts the documents according to scores, selects a batch of documents from the collection for further review, receives user coding decisions for them, and re-trains a classifier using the received user coding decisions and an adjusted training set.
165 Citations
26 Claims
-
1. A system for classifying information, the system comprising:
-
at least one computing device having a processor and physical memory, the physical memory storing instructions that cause the processor to; receive an identification of a relevant document; select a first set of documents from a document collection, wherein the document collection is stored on a non-transitory storage medium; assign a first set of default classifications to documents in the first set of documents to be used as a training set along with the relevant document; train a classifier using the training set; score one or more documents in the document collection using the classifier; upon determining that a stopping criteria has been reached, classify one or more documents in the document collection using the classifier; upon determining that a stopping criteria has not been reached, select a second set of documents having a batch size for presenting to a reviewer for review prior to repeating the step of training the classifier; present one or more documents in the second set of documents to the reviewer; receive from the reviewer user coding decisions associated with the presented documents; add one or more of the documents presented to the reviewer for which user coding decisions were received to the training set; remove one or more documents in the first set of documents from the training set; add a third set of documents from the document collection to the training set; assign a second set of default classifications to one or more documents in the third set of documents; update the classifier using one or more documents in the training set; increase the batch size of documents selected for the second set of documents; and repeat the steps of training, scoring and determining whether a stopping criteria has been reached; wherein the first and second set of default classifications are presumptively assigned classifications used for the purpose of training the classifier in order to form a decision boundary, the presumptively assigned classifications not being based on a review; and
wherein the one or more documents in the first set of documents removed from the training set are documents previously assigned a presumptively assigned classification. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
-
-
14. A computerized method for classifying information, the method comprising:
-
receiving an identification of a relevant document; selecting a first set of documents from a document collection, wherein the document collection is stored on a non-transitory storage medium; assigning a first set of default classifications to documents in the first set of documents to be used as a training set along with the relevant document; training a classifier using the training set; scoring one or more documents in the document collection using the classifier; upon determining that a stopping criteria has been reached, classifying one or more documents in the document collection using the classifier; upon determining that a stopping criteria has not been reached, selecting a second set of documents having a batch size for presenting to a reviewer for review prior to repeating the step of training the classifier; presenting one or more documents in the second set of documents to the reviewer; receiving from the reviewer user coding decisions associated with the presented documents; adding one or more of the documents presented to the reviewer for which user coding decisions were received to the training set; removing one or more documents in the first set of documents from the training set; adding a third set of documents from the document collection to the training set; assigning a second set of default classifications to one or more documents in the third set of documents; updating the classifier using one or more documents in the training set; increasing the batch size of documents selected for the second set of documents; and repeating the steps of training, scoring and determining whether a stopping criteria has been reached; wherein the first and second set of default classifications are presumptively assigned classifications used for the purpose of training the classifier in order to form a decision boundary, the presumptively assigned classifications not being based on a review; and
wherein the one or more documents in the first set of documents removed from the training set are documents previously assigned a presumptively assigned classification. - View Dependent Claims (15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26)
-
Specification