×

Probabilistic information retrieval based on differential latent semantic space

  • US 20030050921A1
  • Filed: 05/08/2001
  • Published: 03/13/2003
  • Est. Priority Date: 05/08/2001
  • Status: Active Grant
First Claim
Patent Images

1. A method for setting up an information retrieval system and retrieving text information, comprising the steps of:

  • preprocessing text including word, noun phrase and stop word identification;

    constructing system terms including setting up a term list and global weights;

    setting up and normalizing document vectors of all collected documents;

    constructing an interior differential term-document matrix DImxn1 such that each column in said interior differential term-document matrix is an interior differential document vector;

    decomposing, using SVD algorithm, DI, such that DI=USVT, then with a proper k1, defining the DI,k1=Uk1Sk1Vk1T to approximate DI;

    defining an interior document likelihood function, P(x|DI);

    constructing an exterior differential term-document matrix DEmxn1, such that each column in said exterior differential term-document matrix is an exterior differential document vector;

    decomposing, using SVD algorithm, DE, such that DE=USVT, then with a proper value of k2, defining the DE,k2=Uk2Sk2Vk2T to approximate DE;

    defining an exterior document likelihood function, P(x|DE); and

    defining a posteriori function P

    (DI,x)
    =P

    (x

    DI
    )


    P

    (DI)
    P

    (x

    DI
    )


    P

    (DI)
    +P

    (x

    DE
    )


    P

    (DE)
    ,
    where P(DI) is set to be an average number of recalls divided by the number of documents in the data base and P(DE) is set to be 1-P(DI).

View all claims
  • 1 Assignment
Timeline View
Assignment View
    ×
    ×