Integration and combination of random sampling and document batching
First Claim
1. A method of integrated batching and random sampling of documents for enhanced functionality within document review processes, comprising:
- receiving a batching request, the batching request including;
a population size that corresponds to a number of a total amount of documents available for sampling; and
an acceptable margin of error;
computing a random sample size from the batching request;
randomly selecting a subset of documents from the total amount of documents available for sampling, a number of the randomly selected subset of documents corresponding to the random sample size, a set of excluded documents being documents in the total amount of documents available for sampling that are not included in the randomly selected subset of documents;
determining a range of relevant documents within the set of excluded documents by determining a population of relevant documents within the set of excluded documents, the determination performed by;
receiving a query regarding the total amount of documents,applying a hypothesis test to the randomly selected subset of documents to calculate a first response to the query for the randomly selected subset of documents, andutilizing the first response to calculate a second response to the query for the population of excluded but relevant documents within the total amount of documents;
randomly grouping the randomly selected subset of documents into a plurality of batches for assignment to a plurality of review nodes, at least one review node being a machine review node and at least one node being a human review node;
assigning each of the randomly grouped batches to a review node of the plurality of review nodes for review of the respective batch;
determining a range of excluded but relevant documents for both a batch of machine reviewed documents and a batch of human reviewed documents;
comparing the ranges together to determine a difference between machine reviewed documents and human reviewed documents; and
utilizing machine document review if the difference is less than a threshold amount.
6 Assignments
0 Petitions
Accused Products
Abstract
Methods and systems of integrated batching and random sampling of documents for enhanced functionality and quality control, such as validation, within a document review process are provided herein. According to various embodiments, a batching request may be received and may include a population size that corresponds to a total amount of documents available for sampling. The batching request may also include an acceptable margin of error. A random sample size may be calculated based on the batching request, and then a subset of documents corresponding to the random sample size may be selected from the total amount of documents available for sampling. The subset of documents may be grouped into one or more batches, and the one or more batches may be assigned to one or more review nodes.
47 Citations
14 Claims
-
1. A method of integrated batching and random sampling of documents for enhanced functionality within document review processes, comprising:
-
receiving a batching request, the batching request including; a population size that corresponds to a number of a total amount of documents available for sampling; and an acceptable margin of error; computing a random sample size from the batching request; randomly selecting a subset of documents from the total amount of documents available for sampling, a number of the randomly selected subset of documents corresponding to the random sample size, a set of excluded documents being documents in the total amount of documents available for sampling that are not included in the randomly selected subset of documents; determining a range of relevant documents within the set of excluded documents by determining a population of relevant documents within the set of excluded documents, the determination performed by; receiving a query regarding the total amount of documents, applying a hypothesis test to the randomly selected subset of documents to calculate a first response to the query for the randomly selected subset of documents, and utilizing the first response to calculate a second response to the query for the population of excluded but relevant documents within the total amount of documents; randomly grouping the randomly selected subset of documents into a plurality of batches for assignment to a plurality of review nodes, at least one review node being a machine review node and at least one node being a human review node; assigning each of the randomly grouped batches to a review node of the plurality of review nodes for review of the respective batch; determining a range of excluded but relevant documents for both a batch of machine reviewed documents and a batch of human reviewed documents; comparing the ranges together to determine a difference between machine reviewed documents and human reviewed documents; and utilizing machine document review if the difference is less than a threshold amount. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A system of integrated batching and random sampling of documents for enhanced functionality within document review processes, the system comprising:
-
a memory for storing executable instructions for batching and random sampling of documents for quality control within document review processes; and a processor for executing the instructions stored in memory, the executable instructions comprising; a query module that receives a batching request, the batching request including a population size that corresponds to a total amount of documents available for sampling and an acceptable margin of error; an analysis module communicatively coupled to the query module that computes a random sample size from the batching request and randomly selects a subset of documents from the total amount of documents available for sampling, the subset of documents corresponding to the random sample size, wherein the analysis module determines a range of excluded but relevant documents within the total amount of documents by determining a population of excluded but relevant documents within the subset of documents, the determination performed by; receiving a query regarding the total amount of documents, applying a hypothesis test to the subset of documents to calculate a first response to the query for the subset of documents, and utilizing the first response to calculate a second response to the query for the population of excluded but relevant documents within the subset of documents; a batching module communicatively coupled to the analysis module that groups the subset of documents into a plurality of batches, and assigns one or more of the batches to a human review node and one or more batches to a machine review node; and a communications module coupled to the batching module and review nodes that; determines a range of excluded but relevant documents for both a subset of machine reviewed documents and a subset of human reviewed documents, compares the ranges together to determine a difference between machine reviewed documents and human reviewed documents, the difference being expressed as a percentage, and transmits the batches to machine document review nodes if the difference is less than a threshold amount. - View Dependent Claims (9, 10)
-
-
11. A method for reviewing documents, comprising:
-
computing a random sample size from a batching request including a number of documents and a margin of error; randomly selecting a subset of documents from the total amount of documents, the randomly selected subset corresponding to the random sample size; determining a range of excluded but relevant documents within the total amount of documents by; receiving a query regarding the total amount of documents, applying a hypothesis test to the randomly selected subset of documents to calculate a first response to the query for the randomly selected subset of documents, and utilizing the first response to calculate a second response to the query for the population of excluded but relevant documents within the subset of documents; randomly grouping the randomly selected subset of documents into a plurality of batches based on at least one of type of document, names mentioned in the documents, and key words; assigning each of the randomly grouped batches to a plurality of review nodes based on at least one of an expertise of a reviewer of a respective node in a certain area and a level of experience of the reviewer, at least one review node being a machine review node and at least one node being a human review node; determining a range of excluded but relevant documents for both a batch of machine reviewed documents and a batch of human reviewed documents; comparing the ranges together to determine a difference between machine reviewed documents and human reviewed documents; and utilizing machine document review if the difference is less than a threshold amount. - View Dependent Claims (12, 13, 14)
-
Specification