×

Methods and apparatus for asynchronous and interactive machine learning using word embedding within text-based documents and multimodal documents

  • US 10,062,039 B1
  • Filed: 06/28/2017
  • Issued: 08/28/2018
  • Est. Priority Date: 06/28/2017
  • Status: Active Grant
First Claim
Patent Images

1. A non-transitory medium storing code representing a plurality of processor-executable instructions, the code comprising code to cause the processor to:

  • execute a machine-assisted iterative search over a data corpus via an asynchronous and interactive machine learning system;

    receive, via a user interface, a first series of tag signals, each tag signal from the first series indicating a membership relation between at least one data object from the data corpus and at least one tag target from a non-empty set of tag targets;

    the code to execute includes;

    select a seed set from a first set of data objects upon a determination that a number of data objects from the first set of data objects having a membership relation with a single tag target from the non-empty set of tag targets has reached a predetermined threshold corresponding to a number of elements of a training set; and

    train a machine learning model based on the seed set to identify further data objects from the data corpus predicted to have a membership relation with the single tag target;

    receive, via the user interface, a second series of tag signals, each tag signal from the second series indicating a membership relation between at least one data object from a second set of data objects and at least one tag target from the non-empty set of tag targets, the second set of data objects includes at least one data object predicted by the machine learning model as having a membership relation with the single tag target;

    the code to execute includes;

    calculate a membership score for each data object from the second set of data objects, the membership score corresponding to a predicted membership degree with respect to the single tag target;

    divide a membership scale of the single tag target into a number of 2b non-overlapping intervals of equal length with b positive non-overlapping intervals defined by a pair of positive endpoint numbers and b negative non-overlapping intervals defined by a pair of negative endpoint numbers, b corresponding to a number of score buckets of a histogram distribution;

    partition the second set of data objects into a number of training subsets equal to 2b+1, the training subsets including;

    (1) a training subset having all data objects from the second set of data objects whose membership relation with respect to the single tag target is undefined, (2) a first set of training subsets with b training subsets, each training subset from the first set of training subsets having data objects with membership scores within a positive non-overlapping interval from the b positive non-overlapping intervals, (3) a second set of training subsets with b training subsets, each training subset from the second set of training subsets having data objects with membership scores within a negative non-overlapping interval from the b negative non-overlapping intervals; and

    re-train the machine learning model based on data objects included in the training subset, the first set of training subsets, and the second set of training subsets;

    display at the user interface, via the asynchronous and interactive machine learning system and based on the re-trained machine learning model, a document object from the data corpus with a magnitude value corresponding to a membership degree between the document object and at least one tag target from the non-empty set of tag targets; and

    enable a user, via the asynchronous and interactive machine learning system, to provide feedback to the machine learning model via an accept input, a dismiss input, an input to modify sections in the document object or an input to modify magnitude values corresponding to membership degrees causing the machine learning model to improve based on the feedback.

View all claims
  • 2 Assignments
Timeline View
Assignment View
    ×
    ×