Selective latent semantic indexing method for information retrieval applications
First Claim
1. A method for generating a reduced rank approximation for information retrieval, comprising:
- forming on a computer a term-by-document matrix A, wherein the elements of the matrix A represent a plurality of terms within a plurality of documents, the documents related to a plurality of topics;
estimating via the computer a plurality of singular values corresponding to at least one of the topics;
identifying via the computer a plurality of actual singular values each having a corresponding singular vector associated with the matrix A;
selecting via the computer a subset of the actual singular values based on actual singular values that correspond to at least one of the estimated singular values; and
determining via the computer a set of singular vectors based on the selected singular values, wherein the singular vectors provide an index for use during information retrieval.
1 Assignment
0 Petitions
Accused Products
Abstract
A term-by-document (or part-by-collection) matrix can be used to index documents (or collections) for information retrieval applications. Reducing the rank of the indexing matrix can further reduce the complexity of information retrieval. A method for index matrix rank reduction can involve computing a singular value decomposition and then retaining singular values based on the singular values corresponding to singular values of multiple topics. The expected singular values corresponding to a topic can be determined using the roots of a specially formed characteristic polynomial. The coefficients of the special characteristic polynomial can be based on computing the determinants of a Gram matrix of term (or part) probabilities, a method of recursion, or a method of recursion further weighted by the probability of document (or collection) lengths.
19 Citations
29 Claims
-
1. A method for generating a reduced rank approximation for information retrieval, comprising:
-
forming on a computer a term-by-document matrix A, wherein the elements of the matrix A represent a plurality of terms within a plurality of documents, the documents related to a plurality of topics; estimating via the computer a plurality of singular values corresponding to at least one of the topics; identifying via the computer a plurality of actual singular values each having a corresponding singular vector associated with the matrix A; selecting via the computer a subset of the actual singular values based on actual singular values that correspond to at least one of the estimated singular values; and determining via the computer a set of singular vectors based on the selected singular values, wherein the singular vectors provide an index for use during information retrieval. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
-
-
10. A method for generating a reduced rank approximation for information retrieval, comprising:
-
forming on a computer a part-by-collection matrix A, wherein the elements of the matrix represent the existence of one or more parts within one or more collections; estimating via the computer a plurality of singular values corresponding to at least one of a plurality of probabilistic distributions on parts; identifying via the computer a plurality of actual singular values each having a corresponding singular vector; selecting via the computer a subset of the actual singular values based on actual singular values that correspond to at least one of the estimated singular values; and determining via the computer a set of singular vectors based on the selected singular values, wherein the singular vectors provide an index for use during information retrieval. - View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
-
-
19. A method for reducing the rank of a matrix A for information retrieval, comprising:
-
determining via a computer a plurality of probabilistic distributions on parts; estimating via the computer a plurality of singular values corresponding to at least one of a plurality of probabilistic distributions on parts; identifying via the computer a plurality of actual singular values each having a corresponding singular vector associated with the matrix A; grouping via the computer the singular values of matrix A based on the singular values of matrix A that correspond to at least one of the estimated singular values; and computing via the computer the reduced rank approximation based on the grouping of the singular values, wherein the reduced rank approximation provides an index for use during information retrieval. - View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
-
-
28. A method for identifying on a computer singular values to generate a reduced rank approximation of a term-by-document matrix for use in information retrieval, comprising the steps of estimating via a computer a plurality of singular values of the matrix, and generating via the computer the reduced rank matrix based on the estimated singular values.
-
29. A method for generating a reduced rank approximation for information retrieval, comprising:
-
forming on a computer a term-by-document matrix A, wherein the elements of the matrix A represent a plurality of terms within a plurality of documents, each of the documents being related to at least one of plurality of topics; identifying via the computer a plurality of actual singular values each having a corresponding singular vector; estimating via the computer a plurality of estimated singular values each corresponding to at least one of the topics; selecting via the computer at least one of the actual singular values based on the estimated singular values; and generating via the computer a reduced rank approximation based on the selected singular values, wherein the reduced rank approximation provides an index for use during information retrieval.
-
Specification