Systems and methods for conducting and terminating a technology-assisted review
First Claim
1. A system for terminating a classification process, the system comprising:
- at least one computing device having a processor and physical memory, the physical memory storing instructions that cause the processor to;
execute the classification process, wherein the classification process utilizes an iterative search strategy that presents documents to a human reviewer for training a classifier to classify documents in a document collection and the documents are stored on a non-transitory storage medium;
receive a user coding decision from the human reviewer and train the classifier using the received user coding decision;
select a gain curve slope ratio threshold;
compute points on a gain curve using a selected set of documents in the document collection and results from the classification process, the points on the gain curve relating a ranking of the selected set of documents to the number of relevant documents retrieved at one or more ranks of the ranking, wherein the ranking relates to an order in which the documents were presented to the human reviewer;
detect an inflection point in the gain curve, wherein to detect the inflection point in the gain curve, the instructions further cause the processor to;
solve for parameters of a line running from an origin of the gain curve to a first point on the gain curve corresponding to a level of recall achieved at a rank of one document in the selected set of documents; and
determine the inflection point as a point on the gain curve from where a perpendicular line of suitable length extends to the line for which the parameters were solved, wherein the perpendicular line of suitable length is a longest perpendicular line;
determine a candidate rank associated with the detected inflection point, wherein the candidate rank is a projection of the intersection of the perpendicular line of suitable length from the gain curve and the gain curve onto an axis of the gain curve;
determine a slope ratio for the detected inflection point using a slope of the gain curve before the detected inflection point, and a slope of the gain curve after the detected inflection point; and
terminate the presentation of documents to the human reviewer in the classification process and classify one more documents in the document collection using the received user coding decision or scores generated by the classifier based upon a determination that the slope ratio for the detected inflection point exceeds the selected slope ratio threshold,continue the classification process based upon a determination that the slope ratio for the detected inflection point does not exceed the selected slope ratio threshold by selecting and presenting one or more documents to the human reviewer for additional user coding decisions, the selection of the presented document being based on the trained classifier.
0 Assignments
0 Petitions
Accused Products
Abstract
Systems and methods are provided for classifying electronic information and terminating a classification process which utilizes Technology-Assisted Review (“TAR”) techniques. In certain embodiments, the TAR process, which is an iterative process, is terminated based upon one more stopping criteria. In certain embodiments, use of the stopping criteria ensures that the TAR process will reliably achieve a level of quality (e.g., recall) with a certain probability. In certain embodiments, the TAR process is terminated when it independently identifies a target set of documents. In certain embodiments, the TAR process is terminated based upon whether the ratio of the slope of the TAR process'"'"'s gain curve before an inflection point to the slope of the TAR process'"'"' gain curve after the inflection point exceeds a threshold. In certain embodiments, the TAR process is terminated when a review budget and slope ratio of the gain curve each exceed a respective threshold.
-
Citations
30 Claims
-
1. A system for terminating a classification process, the system comprising:
-
at least one computing device having a processor and physical memory, the physical memory storing instructions that cause the processor to; execute the classification process, wherein the classification process utilizes an iterative search strategy that presents documents to a human reviewer for training a classifier to classify documents in a document collection and the documents are stored on a non-transitory storage medium; receive a user coding decision from the human reviewer and train the classifier using the received user coding decision; select a gain curve slope ratio threshold; compute points on a gain curve using a selected set of documents in the document collection and results from the classification process, the points on the gain curve relating a ranking of the selected set of documents to the number of relevant documents retrieved at one or more ranks of the ranking, wherein the ranking relates to an order in which the documents were presented to the human reviewer; detect an inflection point in the gain curve, wherein to detect the inflection point in the gain curve, the instructions further cause the processor to; solve for parameters of a line running from an origin of the gain curve to a first point on the gain curve corresponding to a level of recall achieved at a rank of one document in the selected set of documents; and determine the inflection point as a point on the gain curve from where a perpendicular line of suitable length extends to the line for which the parameters were solved, wherein the perpendicular line of suitable length is a longest perpendicular line; determine a candidate rank associated with the detected inflection point, wherein the candidate rank is a projection of the intersection of the perpendicular line of suitable length from the gain curve and the gain curve onto an axis of the gain curve; determine a slope ratio for the detected inflection point using a slope of the gain curve before the detected inflection point, and a slope of the gain curve after the detected inflection point; and terminate the presentation of documents to the human reviewer in the classification process and classify one more documents in the document collection using the received user coding decision or scores generated by the classifier based upon a determination that the slope ratio for the detected inflection point exceeds the selected slope ratio threshold, continue the classification process based upon a determination that the slope ratio for the detected inflection point does not exceed the selected slope ratio threshold by selecting and presenting one or more documents to the human reviewer for additional user coding decisions, the selection of the presented document being based on the trained classifier. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
-
-
16. A computerized method for terminating a classification process, the method comprising:
-
executing the classification process, wherein the classification process utilizes an iterative search strategy that presents documents to a human reviewer for training a classifier to classify documents in a document collection and the documents are stored on a non-transitory storage medium; receive a user coding decision from the human reviewer and train the classifier using the received user coding decision; selecting a gain curve slope ratio threshold; computing points on a gain curve using a selected set of documents in the document collection and results from the classification process, the points on the gain curve relating a ranking of the selected set of documents to the number of relevant documents retrieved at one or more ranks of the ranking, wherein the ranking relates to an order in which the documents were presented to the human reviewer; detecting an inflection point in the gain curve, wherein the inflection point is detected by; solving for parameters of a line running from an origin of the gain curve to a first point on the gain curve corresponding to a level of recall achieved at a rank of one document in the selected set of documents; and determining the inflection point as a point on the gain curve from where a perpendicular line of suitable length extends to the line for which the parameters were solved, wherein the perpendicular line of suitable length is a longest perpendicular line; determining a candidate rank associated with the detected inflection point, wherein the candidate rank is a projection of the intersection of the perpendicular line of suitable length from the gain curve and the gain curve onto an axis of the gain curve; determining a slope ratio for the detected inflection point using a slope of the gain curve before the detected inflection point, and a slope of the gain curve after the detected inflection point; and terminating the presentation of documents to the human reviewer in the classification process and classifying one more documents in the document collection using the received user coding decision or scores generated by the classifier based upon a determination that the slope ratio for the detected inflection point exceeds the selected slope ratio threshold, continuing the classification process based upon a determination that the slope ratio for the detected inflection point does not exceed the selected slope ratio threshold by selecting and presenting one or more documents to the human reviewer for additional user coding decisions, the selection of the presented document being based on the trained classifier. - View Dependent Claims (17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30)
-
Specification