×

Methods and systems for providing universal portability in machine learning

  • US 9,836,450 B2
  • Filed: 12/09/2015
  • Issued: 12/05/2017
  • Est. Priority Date: 12/09/2014
  • Status: Active Grant
First Claim
Patent Images

1. A method for classifying a document in natural language processing using a natural language model stored in one or more data files, the method comprising:

  • accessing, by a processor in a natural language processing platform using the natural language model, one or more feature types from the one or more data files, the one or more feature types each defining a data structure configured to access a tokenized sequence of the document and generate linguistic features from content within the tokenized sequence;

    performing, by the processor in the natural language processing platform, a tokenizing operation of the document, the tokenizing operation configured to generate one or more tokenized sequences from the content within the document;

    generating, by the processor in the natural language processing platform, a plurality of features for the document from the one or more tokenized sequences, based on parameters defined by the one or more feature types and on parameters defined in task configuration data in the one or more data files, the task configuration data associated with a type of task analysis that the natural language model is configured to classify the document into;

    accessing, by the processor in the natural language processing platform, a plurality of probabilities stored in the one or more data files, each probability among the plurality of probabilities associated with a feature among the plurality of features and defining a pre-computed likelihood that said feature predicts a presence or absence of a label that the document is to be classified into;

    wherein;

    the plurality of probabilities are pre-computed during a model training process configured to train the natural language model to classify documents according to at least said label and said task analysis;

    the one or more data files is configured to store each probability in a logarithmic scale that is converted to said probability by the processor;

    the one or more data files is configured to store a table of rows and columns, wherein a first column comprises the plurality of features, a second column comprises a first category of probabilities among the plurality of probabilities that describes a first likelihood that a feature in the first column belonging to the same row satisfies a first attribute of said label, and a third column comprises a second category of probabilities among the plurality of probabilities that describes a second likelihood that said feature in the first column belonging to the same row satisfies a second attribute of said label; and

    the first attribute of said label represents a likelihood that said feature in the same row appears at a beginning of a span of the document, the second attribute of said label represents a likelihood that said feature in the same row appears inside said span of the document, and a fourth column comprises a third category of probabilities among the plurality of probabilities that represents a third likelihood that said feature in the same row appears outside said span of the document;

    computing, by the processor in the natural language processing platform, a prediction score indicating how likely the document is to be classified into said label, based on the plurality of probabilities;

    classifying, by the processor in the natural language processing platform, the document into said label based on comparing the prediction score to a threshold; and

    training the natural language model at least based on the classified document.

View all claims
  • 12 Assignments
Timeline View
Assignment View
    ×
    ×