×

Systems and methods for classifying electronic information using advanced active learning techniques

  • US 8,713,023 B1
  • Filed: 06/18/2013
  • Issued: 04/29/2014
  • Est. Priority Date: 03/15/2013
  • Status: Active Grant
First Claim
Patent Images

1. A system for classifying documents in a document collection as relevant or non-relevant in connection with conducting e-discovery in a legal proceeding, the system comprising:

  • a memory configured to store the document collection;

    a computing device coupled to the memory, the computing device comprising;

    a display;

    a physical input interface;

    a processor coupled to the display and the input interface, the processor being configured to;

    generate a document information profile for the documents in the collection, each document information profile corresponding to a particular document and representing features and related metadata of that document and no other document;

    select a document from the collection to present to a human reviewer;

    display a portion of the selected document on the display;

    receive, through the input interface, one or more user coding decisions associated with the selected document;

    update a classifier using at least one received user coding decision and the document information profile for the document associated with the at least one received user coding decision, wherein the classifier is updated using an incremental learning technique;

    compute a set of scores for the documents in the collection by applying the updated classifier to the document information profile associated with each document to be scored;

    estimate a number of relevant documents in the document collection by (i) fitting scores computed for documents for which user coding decisions were received to a standard distribution curve, and (ii) calculating an area beneath the curve in order to determine whether review is complete by comparing the estimate to a number of documents in the document collection that the user coded as relevant and that were used to update the classifier;

    indicate on the display statistics pertaining to the extent to which review is complete;

    in response to determining that review is not complete, repeat the steps of selecting a document, displaying a portion of the selected document, receiving one or more user coding decisions associated with the selected document, updating a classifier, computing a set of scores, and estimating a number of relevant documents; and

    classify documents in the document collection as relevant or non-relevant to the legal proceeding using the computed scores or the received user coding decisions.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×