×

System and method of feature selection for text classification using subspace sampling

  • US 8,046,317 B2
  • Filed: 12/31/2007
  • Issued: 10/25/2011
  • Est. Priority Date: 12/31/2007
  • Status: Active Grant
First Claim
Patent Images

1. A computer system for classification, comprising:

  • a processor device configured to operate as a text classifier using a plurality of features selected by subspace sampling from a corpus of training data for classification of a document;

    selecting a subset from the plurality of features by subspace sampling comprises using a probability distribution over a plurality of features from a corpus of training texts, the probability distribution having a probability assigned to each of the plurality of features that is proportional to a square of Euclidean norms of a plurality of rows of a plurality of left singular vectors of a matrix of the plurality of features representing the corpus of training texts; and

    a storage operably coupled to the text classifier for storing a plurality of texts classified using the plurality of features selected by subspace sampling into a plurality of classes.

View all claims
  • 9 Assignments
Timeline View
Assignment View
    ×
    ×