System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith
First Claim
1. A system comprising:
- one or more processors; and
memory that stores instructions that are executable by the one or more processors to cause the system to perform operations comprising;
receiving a first output of a categorization process applied to a training subset of documents of a plurality of documents, the first output including a first indication and a second indication for each document in the training set of documents, the first indication indicating a relevance between a document in the training set of documents and an issue in a set of issues and the second indication indicating a lack of relevance between the document and the issue;
generating a classifier based at least in part on the first output;
executing the classifier on the plurality of documents to determine a second output, the second output indicating an extent of relevance of each document in the plurality of documents to the issue;
partitioning individual documents in the plurality of documents into subsets of documents based at least in part on the second output;
adding additional documents from at least one subset of the subsets of documents into the training subset of documents to generate a control subset of documents;
executing, as part of a first iteration, the classifier on the control subset of documents to determine a third output;
determining, based at least in part on the third output, a threshold associated with the classifier, the threshold being associated with a cutoff point for binarizing a ranking of individual documents in the control subset of documents;
computing, based at least in part on the cutoff point, a quality criterion associated with the classifier;
determining, based at least in part on the quality criterion, a first quality of performance of the classifier as applied to the control subset of documents;
receiving an input;
determining, based at least in part on the input, to initiate a second iteration;
determining a second quality of performance of the classifier for the second iteration; and
displaying a comparison of the first quality of performance and the second quality of performance.
2 Assignments
0 Petitions
Accused Products
Abstract
An electronic document analysis method receiving N electronic documents pertaining to a case encompassing a set of issues including at least one issue and establishing relevance of at least the N documents to at least one individual issue in the set of issues, the method comprising, for at least one individual issue from among the set of issues, receiving an output of a categorization process applied to each document in training and control subsets of the at least N documents, the output including, for each document in the subsets, one of a relevant-to-the-individual issue indication and a non-relevant-to-the-individual issue indication; building a text classifier simulating the categorization process using the output for all documents in the training subset of documents; and running the text classifier on the at least N documents thereby to obtain a ranking of the extent of relevance of each of the at least N documents to the individual issue. The method may also comprise evaluating the text classifier'"'"'s quality using the output for all documents in the control subset.
-
Citations
15 Claims
-
1. A system comprising:
-
one or more processors; and memory that stores instructions that are executable by the one or more processors to cause the system to perform operations comprising; receiving a first output of a categorization process applied to a training subset of documents of a plurality of documents, the first output including a first indication and a second indication for each document in the training set of documents, the first indication indicating a relevance between a document in the training set of documents and an issue in a set of issues and the second indication indicating a lack of relevance between the document and the issue; generating a classifier based at least in part on the first output; executing the classifier on the plurality of documents to determine a second output, the second output indicating an extent of relevance of each document in the plurality of documents to the issue; partitioning individual documents in the plurality of documents into subsets of documents based at least in part on the second output; adding additional documents from at least one subset of the subsets of documents into the training subset of documents to generate a control subset of documents; executing, as part of a first iteration, the classifier on the control subset of documents to determine a third output; determining, based at least in part on the third output, a threshold associated with the classifier, the threshold being associated with a cutoff point for binarizing a ranking of individual documents in the control subset of documents; computing, based at least in part on the cutoff point, a quality criterion associated with the classifier; determining, based at least in part on the quality criterion, a first quality of performance of the classifier as applied to the control subset of documents; receiving an input; determining, based at least in part on the input, to initiate a second iteration; determining a second quality of performance of the classifier for the second iteration; and displaying a comparison of the first quality of performance and the second quality of performance. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A method comprising:
-
receiving a first output of a categorization process applied to a training subset of documents of a plurality of documents, the first output including a first indication and a second indication for each document in the training set of documents, the first indication indicating a relevance between a document in the training set of documents and an issue in a set of issues and the second indication indicating a lack of relevance between the document and the issue; generating a classifier based at least in part on the first output; executing the classifier on the plurality of documents to determine a second output indicating an extent of relevance of each document in the plurality of documents to the issue; partitioning, based at least in part on the second output, individual documents in the plurality of documents into subsets of documents; adding additional documents from at least one subset of the subsets of documents into the training subset of documents to generate a control subset of documents; executing, as part of a first iteration, the classifier on the control subset of documents to determine a third output; determining, based at least in part on the third output, a threshold associated with the classifier, the threshold being associated with a cutoff point for binarizing a ranking of individual documents in the control subset of documents; computing, based at least in part on the cutoff point, a quality criterion associated with the classifier; determining, based at least in part on the quality criterion, a first quality of performance of the classifier as applied to the control subset of documents; receiving an input; determining, based at least in part on the input, to initiate a second iteration; determining a second quality of performance of the classifier for the second iteration; and displaying a comparison of the first quality of performance and the second quality of performance.
-
-
10. A computer storage device storing instructions that, when executed by one or more processors, cause a device to perform operations comprising:
-
receiving a first output of a categorization process applied to a training subset of documents of a plurality of documents, the first output including a first indication and a second indication for each document in the training set of documents, the first indication indicating a relevance between a document in the training set of documents and an issue in a set of issues and the second indication indicating a lack of relevance between the document and the issue; generating a classifier based at least in part on the first output; executing the classifier on the plurality of documents to determine a second output, the second output indicating an extent of relevance of each document in the plurality of documents to the issue; partitioning individual documents in the plurality of documents into subsets of documents based at least in part on the second output; adding additional documents from at least one subset of the subsets of documents into the training subset of documents to generate a control subset of documents; executing, as part of a first iteration, the classifier on the control subset of documents to determine a third output; determining, based at least in part on the third output, a threshold associated with the classifier, the threshold being associated with a cutoff point for binarizing a ranking of individual documents in the control subset of documents; computing, based at least in part on the cutoff point, a quality criterion associated with the classifier; determining, based at least in part on the quality criterion, a first quality of performance of the classifier as applied to the control subset of documents; receiving an input; determining, based at least in part on the input, to initiate a second iteration; determining a second quality of performance of the classifier for the second iteration; and displaying a comparison of the first quality of performance and the second quality of performance. - View Dependent Claims (11, 12, 13, 14, 15)
-
Specification