×

Systems and methods for classifying electronic information using advanced active learning techniques

  • US 9,122,681 B2
  • Filed: 03/15/2013
  • Issued: 09/01/2015
  • Est. Priority Date: 03/15/2013
  • Status: Active Grant
First Claim
Patent Images

1. A system for classifying documents in a document collection into one or more classes or subclasses using a continuous active learning process for the purpose of conducting e-discovery in legal proceedings, the system comprising:

  • a memory adapted to store the document collection;

    a computing device coupled to the memory, the computing device comprising;

    a display;

    a physical input interface;

    a processor coupled to the display and the input interface, the processor being adapted to;

    generate a document information profile for the documents in the collection, each document information profile corresponding to a particular document and representing features of that document;

    select a document from the collection to present to a human reviewer;

    display a portion of the selected document on the display;

    receive, through the input interface, one or more user coding decisions associated with the selected document;

    for at least one class or subclass, incrementally update a classifier using at least one received user coding decision and the document information profile for the document associated with the at least one received user coding decision;

    for at least one classifier, compute a set of scores for the documents in the collection by applying the at least one classifier to the document information profile associated with each document to be scored;

    for at least one class or subclass, estimate the number of documents in that class or subclass by fitting the scores calculated using the classifier that corresponds to that class or subclass to a standard distribution;

    validate at least one of the estimates using the received user coding decisions;

    in response to determining that one of the estimates is valid, indicate, on the display or the input interface, that the review is complete for the class or subclass associated with that estimate;

    classify documents in the document collection into the classes or subclasses using the scores and the received user coding decisions; and

    repeat the steps of selecting a document, receiving user coding decisions associated with the selected document, calculating a classifier, computing a set of scores,estimating the number of documents in at least one class or subclass, and validating at least one estimate.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×