×

METHOD FOR FEATURE SELECTION AND FOR EVALUATING FEATURES IDENTIFIED AS SIGNIFICANT FOR CLASSIFYING DATA

  • US 20110078099A1
  • Filed: 09/26/2010
  • Published: 03/31/2011
  • Est. Priority Date: 05/18/2001
  • Status: Active Grant
First Claim
Patent Images

1. A method for estimating a subset of features falsely labeled “

  • significant”

    within a group of features which appear able to separate a dataset comprising multiple examples into two or more classes, the method comprising;

    inputting the dataset into a computer adapted for implementing a support vector machine;

    separately for each feature of the group of features, assigning a value to the feature by;

    processing the dataset using the support vector machine to separate the examples into classes according to known outcomes, wherein the classes comprise one class having one set of feature values and at least one other class having another set of feature values;

    calculating an extremal margin value between a lowest feature value in the one class and the highest feature value in the at least one other class;

    generating a list of the group of features and their calculated extremal margin values;

    before or after assigning a value to the feature, determining a probability of obtaining an extremal margin value that exceeds a normal distribution of extremal margin values by;

    drawing a set of examples from each class at random according to a normal distribution;

    processing the randomly drawn example set using the support vector machine for each feature of the group of features to separate the randomly drawn example set into classes;

    computing the extremal margin value within the randomly drawn example set;

    repeating the steps of drawing, processing and computing for a large number of randomly drawn sets;

    generating a table comprising estimated p-values, wherein the estimated p-value is a fraction of the large number of randomly drawn sets in which the computed extremal margin value exceeds a specified extremal margin value;

    selecting a desired p-value;

    determining from the table the specified extremal margin value corresponding to the desired p-value;

    identifying as falsely significant features the features on the list of the group of features that have an extremal margin value of less than the specified extremal value corresponding to the desired p-value;

    generating an output comprising a listing of the falsely significant features; and

    transferring the output to a media.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×