×

Unified text analytics annotator development life cycle combining rule-based and machine learning based techniques

  • US 10,289,963 B2
  • Filed: 02/27/2017
  • Issued: 05/14/2019
  • Est. Priority Date: 02/27/2017
  • Status: Active Grant
First Claim
Patent Images

1. A method for developing a text analytics program for extracting at least one target concept comprising:

  • utilizing at least one processor to execute computer code that performs the steps of;

    initiating a development tool that accepts user input to develop rules for extraction of features of the at least one target concept within a dataset comprising textual information;

    developing, using the rules for feature extraction, an evaluation dataset comprising at least one document annotated with the at least one target concept to be extracted by the text analytics program;

    creating, using the rules for feature extraction, a rule-based annotator to extract the at least one target concept;

    training, using the evaluation dataset, a machine-learning annotator to extract the at least one target concept within the dataset;

    evaluating each of the rule-based annotator and the machine-learning annotator against the evaluation dataset and comparing the extraction results, of each of the rule-based annotator and the machine-learning annotator, from the evaluation against a threshold for accuracy;

    combining, responsive to determining each of the rule-based annotator and the machine-learning annotator meet the threshold for accuracy, the rule-based annotator and the machine-learning annotator to form a combined annotator having features from both of the rule-based annotator and the machine-learning annotator;

    evaluating, using the evaluation dataset, extraction performance of the combined annotator against a predetermined threshold; and

    publishing, when the extraction performance of the combined annotator exceeds the predetermined threshold, the combined annotator for use in an application that extracts the at least one target concept from a plurality of datasets.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×