×

Differential LSI space-based probabilistic document classifier

  • US 7,024,400 B2
  • Filed: 05/08/2001
  • Issued: 04/04/2006
  • Est. Priority Date: 05/08/2001
  • Status: Expired due to Term
First Claim
Patent Images

1. A method of setting up a DLSI space-based classifier to be stored in a computer storage device for document classification using a computer, and using said classifier by said computer to classify a document according to a plurality of clusters within a database, comprising the steps of:

  • preprocessing documents using said computer to distinguish terms of a word and a noun phrase from stop words;

    constructing system terms by setting up a term list as well as global weights using said computer;

    normalizing document vectors of collected documents, as well as centroid vectors of each cluster using said computer;

    constructing a differential term by intra-document matrix DI

    n
    I using said computer, such that each column in said matrix is a differential intra-document vector;

    decomposing the differential term by intra-document matrix DI, by an SVD algorithm using said computer, into DI=UISIVIT(SI=diag(δ

    I,1

    I,2, . . . )), followed by a composition of DI,kI=UkISkIVkIT giving an approximate DI in terms of an appropriate kI;

    setting up a likelihood function of intra-differential document vector using said computer;

    constructing a term by extra-document matrix DE

    n
    E using said computer, such that each column of said extra-document matrix is an extra-differential document vector;

    decomposing DE, by exploiting the SVD algorithm using said computer, into DE=UESEVET(SE=diag(δ

    E,1

    E,2, . . . )), then with a proper kE, defining DE,kE=UkESkEVkET to approximate DE;

    setting up a likelihood function of extra-differential document vector using said computer;

    setting up a posteriori function using said computer; and

    said computer using said DLSI space-based classifier, as set up in the foregoing steps, to classify the document as belonging to one of the plurality of clusters within the database.

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×