×

New differential LSI space-based probabilistic document classifier

  • US 20030037073A1
  • Filed: 05/08/2001
  • Published: 02/20/2003
  • Est. Priority Date: 05/08/2001
  • Status: Active Grant
First Claim
Patent Images

1. A method of setting up a DLSI space-based classifier for document classification comprising the steps of:

  • preprocessing documents to distinguish terms of a word and a noun phrase from stop words;

    constructing system terms by setting up a term list as well as global weights;

    normalizing document vectors of collected documents, as well as centroid vectors of each cluster;

    constructing a differential term by intra-document matrix DI

    n
    I, such that each column in said matrix is a differential intra-document vector;

    decomposing the differential term by intra-document matrix DI, by an SVD algorithm, into DI=UISIVIT(SI=diag(δ

    I,1

    I,2, . . . )), followed by a composition of DI,kI=UkISkIVkIT giving an approximate DI in terms of an appropriate kI;

    setting up a likelihood function of intra-differential document vector;

    constructing a term by extra-document matrix DE

    n
    E, such that each column of said extra-document matrix is an extra-differential document vector;

    decomposing DE, by exploiting the SVD algorithm, into DE=UESEVET(SE=diag(δ

    E,1

    E,2, . . . )), then with a proper kE, defining DE,kE=UkESkEVkET to approximate DE;

    setting up a likelihood function of extra-differential document vector;

    setting up a posteriori function; and

    using the DLSI space-based classifier to automatically classify a document.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×