Probabilistic information retrieval based on differential latent semantic space
First Claim
Patent Images
1. A method for setting up an information retrieval system and retrieving text information, comprising the steps of:
- preprocessing text including word, noun phrase and stop word identification;
constructing system terms including setting up a term list and global weights;
setting up and normalizing document vectors of all collected documents;
constructing an interior differential term-document matrix DImxn1 such that each column in said interior differential term-document matrix is an interior differential document vector;
decomposing, using SVD algorithm, DI, such that DI=USVT, then with a proper k1, defining the DI,k1=Uk1Sk1Vk1T to approximate DI;
defining an interior document likelihood function, P(x|DI);
constructing an exterior differential term-document matrix DEmxn1, such that each column in said exterior differential term-document matrix is an exterior differential document vector;
decomposing, using SVD algorithm, DE, such that DE=USVT, then with a proper value of k2, defining the DE,k2=Uk2Sk2Vk2T to approximate DE;
defining an exterior document likelihood function, P(x|DE); and
defining a posteriori function where P(DI) is set to be an average number of recalls divided by the number of documents in the data base and P(DE) is set to be 1−
P(DI).
1 Assignment
0 Petitions
Accused Products
Abstract
A computer-based information search and retrieval system and method for retrieving textual digital objects that makes full use of the projections of the documents onto both the reduced document space characterized by the singular value decomposition-based latent semantic structure and its orthogonal space. The resulting system and method has increased robustness, improving the instability of the traditional keyword search engine due to synonymy and/or polysemy of a natural language, and therefore is particularly suitable for web document searching over a distributed computer network such as the Internet.
-
Citations
11 Claims
-
1. A method for setting up an information retrieval system and retrieving text information, comprising the steps of:
-
preprocessing text including word, noun phrase and stop word identification;
constructing system terms including setting up a term list and global weights;
setting up and normalizing document vectors of all collected documents;
constructing an interior differential term-document matrix DImxn 1 such that each column in said interior differential term-document matrix is an interior differential document vector;
decomposing, using SVD algorithm, DI, such that DI=USVT, then with a proper k1, defining the DI,k 1 =Uk1 Sk1 Vk1 T to approximate DI;
defining an interior document likelihood function, P(x|DI);
constructing an exterior differential term-document matrix DEmxn 1 , such that each column in said exterior differential term-document matrix is an exterior differential document vector;
decomposing, using SVD algorithm, DE, such that DE=USVT, then with a proper value of k2, defining the DE,k 2 =Uk2 Sk2 Vk2 T to approximate DE;
defining an exterior document likelihood function, P(x|DE); and
defining a posteriori function where P(DI) is set to be an average number of recalls divided by the number of documents in the data base and P(DE) is set to be 1−
P(DI).- View Dependent Claims (2, 3, 4, 5, 6)
where r1 is a rank of matrix DI.
-
-
3. The method as set forth in claim 2, wherein, ρ
-
1 is chosen as δ
ki +12/2, and r1 is n1.
-
1 is chosen as δ
-
4. The method as set forth in claim 1, wherein the exterior document likelihood function, P(x|DE) is,
-
( x D E ) = n 1 1 / 2 exp ( - n 2 2 ∑ i = 1 k 2 y i 2 δ i 2 ) · exp ( - n 2 ɛ 2 ( x ) 2 ρ 2 ) ( 2 π ) n 2 / 2 ∏ i = 1 k 2 δ i · ρ 2 ( r 2 - k 2 ) / 2 , where r2 is a rank of matrix DE.
-
-
5. The method as set forth in claim 4, wherein ρ
-
2 is chosen as δ
k2 +12/2, and r2 is n2.
-
2 is chosen as δ
-
6. The method as set forth in claim 1, further comprising the steps of:
-
setting up a document vector for a query by generating terms as well as frequency of term occurrence, and thereby obtaining a normalized document vector for the query;
given the query, constructing a differential document vector x;
calculating the interior document likelihood function P(x|DI) and the exterior document likelihood function P(x|DE) for the document;
calculating the posteriori probability function P(DI|x); and
selecting documents according to one of P(DI|x) exceeding a given threshold or N best documents with largest P(DI|x), those values of P(DI|x) being shown as scores to rank a match.
-
-
7. A method for setting up an information retrieval system and retrieving text information, comprising the steps of:
-
preprocessing text;
constructing system terms;
setting up and normalizing document vectors of all collected documents;
constructing an interior differential term-document matrix DImxn 1 such that each column in said interior differential term-document matrix is an interior differential document vector;
decomposing DI, such that DI=USVT, then with a proper k1, defining the DI,k 1 =Uk1 Sk1 Vk1 T to approximate DI;
defining an interior document likelihood function, P(x|DI);
constructing an exterior differential term-document matrix DEmxn 2 , such that each column in said exterior differential term-document matrix is an exterior differential document vector;
decomposing DE, such that DE=USVT, then with a proper value of k2, defining the DE,k 2 =Uk2 Sk2 Vk2 T to approximate DE;
defining an exterior document likelihood function, P(x|DE);
defining a posteriori function where P(DI) is set to be an average number of recalls divided by the number of documents in the data base and P(DE) is set to be 1-P(DI); setting up a document vector for a query by generating terms as well as frequency of term occurrence, and thereby obtaining a normalized document vector for the query;
given the query, constructing a differential document vector x;
calculating the interior document likelihood function P(x|DI) and the exterior document likelihood function P(x|DE) for the document;
calculating the posteriori probability function P(DI|x); and
selecting documents according to one of P(DI|x) exceeding a given threshold or N best documents with largest P(DI|x), those values of P(DI|x) being shown as scores to rank a match. - View Dependent Claims (8, 9, 10, 11)
where r1 is a rank of matrix DI.
-
-
9. The method as set forth in claim 8, wherein, ρ
-
1 is chosen as δ
ki +12/2, and r1 is n1.
-
1 is chosen as δ
-
10. The method as set forth in claim 7, wherein the exterior document likelihood function, P(x|DE) is,
-
( x | D E ) = n 2 1 / 2 exp ( - n 2 2 ∑ i = 1 k 2 y i 2 δ i 2 ) · exp ( - n 2 ɛ 2 ( x ) 2 ρ 2 ) ( 2 π ) n 2 / 2 ∏ i = 1 k 2 δ i · ρ 2 ( r 2 - k 2 ) / 2 , where r2 is a rank of matrix DE.
-
-
11. The method as set forth in claim 10, wherein ρ
-
2 is chosen as δ
k2 +12 /2, and r2 is n2.
-
2 is chosen as δ
Specification