×

Systems and methods for classifying electronic information using advanced active learning techniques

  • US 9,678,957 B2
  • Filed: 07/22/2015
  • Issued: 06/13/2017
  • Est. Priority Date: 03/15/2013
  • Status: Active Grant
First Claim
Patent Images

1. An active learning system for classifying documents in a document collection as a member of one or more classes or subclasses, the system comprising:

  • a processor being adapted to;

    select a document from the document collection;

    calculate at least two predicted classifiers for at least one of the one or more classes or subclasses, each predicted classifier being calculated using a document information profile for the selected document, a current classifier associated with at least one of the one or more classes or subclasses, and a different coding decision selected from a set of possible user coding decisions to be received from a user, thereby resulting in a plurality of predicted classifiers each one corresponding to a different user coding decision;

    determine a processing order for a subset of documents in the document collection that indicates an order in which the documents of the subset are to be scored;

    for each one of the predicted classifiers, calculate a set of scores for one or more documents in the document collection, at least in part, according to the processing order, wherein each score is generated for a document by utilizing the corresponding predicted classifier and a document information profile of the document to be scored;

    receive a user coding decision;

    determine whether one or more stopping criteria have been met using a subset of the set of scores based on the predicted classifier that corresponds to the received user coding decision, wherein determining whether one or more stopping criteria have been met includes selecting and presenting documents from the document collection to a user and calculating an estimate of system effectiveness using the user coding decisions for the selected documents;

    so long as the one or more stopping criteria have not been met, select a further document from the document collection and repeat the steps of calculating predicted classifiers, determining a processing order, calculating a set of scores, and classifying a set of documents based on the selected further documents; and

    in response to determining whether one or more stopping criteria have been met, classify a set of documents in the document collection into one or more of the one or more classes or subclasses using a subset of the set of scores based on the predicted classifier that corresponds to the received user coding decision.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×