METHODS AND SYSTEMS FOR AUTOMATIC EVALUATION OF ELECTRONIC DISCOVERY REVIEW AND PRODUCTIONS
First Claim
1. A computer-implemented method for evaluating a search process, the method comprising:
- receiving, at the one or more computer systems, information identifying in a collection of documents a first set of documents that satisfy search criteria associated with a first search;
determining, with one or more processor associated with the one or more computer systems, a document feature vector for each document in the first set of documents;
receiving, with the one or more processors associated with the one or more computer systems, information identifying in the documents in the collection of documents that do not satisfy the search criteria associated with the first search a second set of documents that satisfy first sampling criteria;
determining, with the one or more processor associated with the one or more computer systems, a document feature vector for each document in the second set of documents;
determining, with the one or more processor associated with the one or more computer systems, whether a second search of the collection results in new document gain based on the document feature vector for each document in the first set of documents and the document feature vector for at least one document in the second set of documents; and
generating, with the one or more processor associated with the one or more computer systems, information indicative of whether the second search of the collection results in new document gain.
8 Assignments
0 Petitions
Accused Products
Abstract
Techniques are provided for automatic sampling evaluation. An automatic sampling evaluation system enables users to evaluate convergence of one or more search processes. For example, given a set of searches that were validated by human review, a system can implement a retrieval process that samples one or more non-retrieved collections. Each individual document'"'"'s similarity in the one or more non-retrieved collections is automatically evaluated to other documents in any retrieved sets. Given a goal of achieving a high recall, documents with high similarity can then be analyzed for additional noun phrases that may be used for a next iteration of a search. Convergence can be expected if the information gain in the new feedback loop is less than previous iterations, and if the additional documents identified are below a certain threshold document count.
344 Citations
20 Claims
-
1. A computer-implemented method for evaluating a search process, the method comprising:
-
receiving, at the one or more computer systems, information identifying in a collection of documents a first set of documents that satisfy search criteria associated with a first search; determining, with one or more processor associated with the one or more computer systems, a document feature vector for each document in the first set of documents; receiving, with the one or more processors associated with the one or more computer systems, information identifying in the documents in the collection of documents that do not satisfy the search criteria associated with the first search a second set of documents that satisfy first sampling criteria; determining, with the one or more processor associated with the one or more computer systems, a document feature vector for each document in the second set of documents; determining, with the one or more processor associated with the one or more computer systems, whether a second search of the collection results in new document gain based on the document feature vector for each document in the first set of documents and the document feature vector for at least one document in the second set of documents; and generating, with the one or more processor associated with the one or more computer systems, information indicative of whether the second search of the collection results in new document gain. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A non-transitory computer-readable medium storing computer-executable code for evaluating a search process, the non-transitory computer-readable medium comprising:
-
code for receiving information identifying in a collection of documents a first set of documents that satisfy search criteria associated with a first search; code for determining a document feature vector for each document in the first set of documents; code for receiving information identifying in the documents in the collection of documents that do not satisfy the search criteria associated with the first search a second set of documents that satisfy first sampling criteria; code for determining a document feature vector for each document in the second set of documents; code for determining whether a second search of the collection results in new document gain based on the document feature vector for each document in the first set of documents and the document feature vector for at least one document in the second set of documents; and code for generating information indicative of whether the second search of the collection results in new document gain. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A system for evaluating search process of electronic discovery investigations, the system comprising:
-
a processor; and a memory configured to store a set of instructions which when executed by the processor configure the processor to; receive information identifying in a collection of documents a first set of documents that satisfy search criteria associated with a first search; determine a document feature vector for each document in the first set of documents; receive information identifying in the documents in the collection of documents that do not satisfy the search criteria associated with the first search a second set of documents that satisfy first sampling criteria; determine a document feature vector for each document in the second set of documents; determine whether a second search of the collection results in new document gain based on the document feature vector for each document in the first set of documents and the document feature vector for at least one document in the second set of documents; and generate information indicative of whether the second search of the collection results in new document gain. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification