×

Method and apparatus for predictive modeling & analysis for knowledge discovery

  • US 20080133434A1
  • Filed: 11/12/2004
  • Published: 06/05/2008
  • Est. Priority Date: 11/12/2004
  • Status: Abandoned Application
First Claim
Patent Images

1. A method and apparatus for predictive modeling &

  • analysis for knowledge discovery comprising;

    selecting a specific target for which predictive modeling and analysis is to be performed;

    importing the dataset into learning and testing data sets;

    learning dataset is further divided into training and validation datasets;

    normalizing and cleaning the dataset;

    systematic dimensionality reduction of features from the learning dataset in order to improve the performance of creating models without sacrificing speed;

    configuring the apparatus for either a single-class or multi-class classification modeling or a regression modeling or optionally both;

    optionally selecting an appropriate linear or non-linear kernel for modeling;

    selecting an auto-tuning parameter for automatically optimizing and selecting the best model with the highest accuracy for correct predictions of activity including selecting a linear or non-linear kernel that yields the best model with the highest accuracy;

    creating models using support vector machines and other algorithms such as Naive Bayes, Random Forest, Ridge Regression with the learning dataset and auto-selecting the best model with the best accuracy for correct predictions of activity;

    testing the test dataset against the auto-selected best model to determine over-fitting;

    discovering dominant features and characteristics as in the learning dataset for the given target and the selected model;

    performing cluster analysis on the learning dataset to discover different classes and series of similar data-points and discovering dominant features and characteristics of each cluster;

    further systematic dimensionality reduction of features from the learning dataset in order to further improve accuracy based on the selected auto-tuning parameter;

    iteratively re-creating models using support vector machines or other algorithms including Naï

    ve Bayes, Random Forest and Ridged Regression with the learning dataset with reduced features and then auto-selecting the best model with the best accuracy for correct predictions of activity;

    discovering noise in the training dataset by performing Noise Discovery Cross Validation Algorithm.predicting activity and level of activity of data-points with unknown ground truth using the selected best model;

    discovering dominant features and characteristics of the data-points in the prediction dataset for the given target;

    performing similarity discovery to discover if the prediction dataset and training dataset come from similar distribution and series;

    packaging and exporting models to be integrated and used with other third party applications;

    recreating the best model by only training on the support vectors in case the algorithm used for training is Support Vector Machines;

    allowing users to add additional data to the original training dataset for retraining and generating local models that are more specific to the users problem domain;

    ability to perform incremental learning by adding new training data to improve the model without having to re-run and re-generate model.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×