Methods for enhancing efficiency and cost effectiveness of first pass review of documents
First Claim
1. A method utilizing a computer system including a computer processor, the method for reviewing a collection of documents in electronic form stored in one or more memories wherein the computer system is configured to communicate with the one or more memories to identify relevant documents from the collection of documents, the method comprising the steps of:
- receiving a plurality of query terms in a relevant language string first defined and indicated by a reviewer, via an interface controller of the computer system;
automatically creating a Boolean query by the system and without user interaction, based on the plurality of query terms in the relevant language string, by applying criteria to refine the relevant language string;
running a search of the collection of documents in the one or more memories, the search being based on the plurality of query terms of the relevant language string first defined and indicated by the reviewer, the computer processor using the Boolean query to isolate substantially all documents from the collection of documents stored in memory that are relevant to the Boolean query and to return the substantially all documents in a subset of responsive documents from the collection of documents stored in the memory;
establishing a threshold probability of relevancy for a particular reviewing operation in relation to the collection of documents;
determining a corresponding probability of relevancy for each document in the subset of responsive documents;
removing from the subset of responsive documents, documents that do not reach a threshold probability of relevancy established for the particular reviewing operation;
randomly selecting a predetermined number of documents from a remaining subset of the collection of documents not in the subset of responsive documents, wherein if the randomly selected documents includes one or more additional relevant documents, the query terms are expanded and the search is re-run with the expanded query terms; and
determining whether the randomly selected documents include additional relevant documents by further refining the Boolean query; and
comparing a ratio of the additional relevant documents and the randomly selected documents to a predetermined acceptance level, to determine whether to apply a refined set of query terms.
21 Assignments
0 Petitions
Accused Products
Abstract
Methods for reviewing a collection of documents to identify relevant documents from the collection are provided. A search of the collection can be run based on query terms, to return a subset of responsive documents. A probability of relevancy can be determined for a document in the returned subset, and the document is removed from the subset if it does not reach a threshold probability of relevancy. Documents in a thread of a correspondence (for example, an e-mail) in the responsive documents subset can be added to the responsive documents subset. Further, an attachment to a document in the responsive documents subset can be added to the responsive documents subset. A statistical technique can be applied to determine whether remaining documents in the collection meet a predetermined acceptance level.
-
Citations
23 Claims
-
1. A method utilizing a computer system including a computer processor, the method for reviewing a collection of documents in electronic form stored in one or more memories wherein the computer system is configured to communicate with the one or more memories to identify relevant documents from the collection of documents, the method comprising the steps of:
-
receiving a plurality of query terms in a relevant language string first defined and indicated by a reviewer, via an interface controller of the computer system; automatically creating a Boolean query by the system and without user interaction, based on the plurality of query terms in the relevant language string, by applying criteria to refine the relevant language string; running a search of the collection of documents in the one or more memories, the search being based on the plurality of query terms of the relevant language string first defined and indicated by the reviewer, the computer processor using the Boolean query to isolate substantially all documents from the collection of documents stored in memory that are relevant to the Boolean query and to return the substantially all documents in a subset of responsive documents from the collection of documents stored in the memory; establishing a threshold probability of relevancy for a particular reviewing operation in relation to the collection of documents; determining a corresponding probability of relevancy for each document in the subset of responsive documents; removing from the subset of responsive documents, documents that do not reach a threshold probability of relevancy established for the particular reviewing operation; randomly selecting a predetermined number of documents from a remaining subset of the collection of documents not in the subset of responsive documents, wherein if the randomly selected documents includes one or more additional relevant documents, the query terms are expanded and the search is re-run with the expanded query terms; and determining whether the randomly selected documents include additional relevant documents by further refining the Boolean query; and comparing a ratio of the additional relevant documents and the randomly selected documents to a predetermined acceptance level, to determine whether to apply a refined set of query terms. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method utilizing a computer system including a computer processor and a memory for reviewing a collection of documents in electronic form stored in the memory to identify relevant documents from the collection, the method comprising the steps of:
-
running a search of the collection of documents, the search being based on a plurality of query terms of a relevant language string first defined and indicated by a reviewer to the computer system, and subsequently utilized by the computer processor to automatically create a Boolean query by the system and without user interaction, based on applying criteria to refine the relevant language string, the computer processor using the Boolean query to isolate substantially all documents from the collection of documents stored in memory that are relevant to the Boolean query and to return the substantially all documents in a subset of responsive documents from the collection of documents stored in the memory; establishing a threshold probability of relevancy for a particular reviewing operation; determining a corresponding probability of relevancy for each document in the subset of responsive documents; and removing from the subset of responsive documents, documents that do not reach a threshold probability of relevancy established for the particular reviewing operation, wherein the search includes (a) a precision Boolean search of the collection of documents based on the Boolean query utilizing the plurality of query terms, the Boolean search returning a first subset of responsive documents from the collection, and (b) a second search by applying a recall query based on the plurality of query terms to remaining ones of the collection of documents which were not returned by the Boolean search, the second search returning a second subset of responsive documents in the collection, and wherein the responsive documents subset is composed of the first and second subsets. - View Dependent Claims (11, 12, 13, 14)
-
-
15. A method utilizing a computer system including a computer processor and one or more memories, the method for reviewing a collection of documents stored in electronic form in the one or more memories to identify substantially all relevant documents from the collection, the method comprising the steps of:
-
running a first search of the collection of documents, based on a plurality of query terms of a relevant language string first formulated by a reviewer and indicated to the computer system and used by the computer processor of the computer system to automatically create an automatic Boolean query without user interaction based on applying criteria to refine the relevant language string, the computer processor using the automatic Boolean query to isolate substantially all documents from the collection of documents stored in memory that are relevant to the Boolean query, the search returning a subset of responsive documents from the collection stored in memory including all documents relevant to the Boolean query; utilizing the computer processor to automatically identify a correspondence between a sender of a particular document in the collection of documents stored in memory and a recipient of a particular document in the collection of documents stored in the memory, which are included in the subset of responsive documents; automatically determining additional documents which are determined to be in a thread of the correspondence between the sender and the recipient, but not included in the subset of responsive documents; adding the additional documents in the thread of correspondence to the subset of responsive documents; randomly selecting a predetermined number of documents from a remainder of the collection of documents not in the responsive documents subset; determining whether the randomly selected documents include additional relevant documents; comparing a ratio of the additional relevant documents and the randomly selected documents to a predetermined acceptance level; and expanding the query terms and rerunning the search with the expanded query terms, if the ratio does not meet the predetermined acceptance level. - View Dependent Claims (16, 17, 18, 19, 20, 21, 22, 23)
-
Specification