Selective latent semantic indexing method for information retrieval applications
First Claim
1. A method for generating a reduced rank matrix approximation for information retrieval, comprising:
- forming a term-by-document matrix A, wherein the elements of the matrix A represent a plurality of terms within a plurality of documents, the documents related to a plurality of topics;
performing a singular value decomposition of the matrix A to identify a plurality of actual singular values each having a corresponding singular vector;
selecting a subset of the actual singular values based on singular values that correspond to at least one desired topic; and
determining a set of singular vectors based on the selected singular values, wherein the singular vectors provide an index for use during information retrieval.
0 Assignments
0 Petitions
Accused Products
Abstract
A term-by-document (or part-by-collection) matrix can be used to index documents (or collections) for information retrieval applications. Reducing the rank of the indexing matrix can further reduce the complexity of information retrieval. A method for index matrix rank reduction can involve computing a singular value decomposition and then retaining singular values based on the singular values corresponding to singular values of multiple topics. The expected singular values corresponding to a topic can be determined using the roots of a specially formed characteristic polynomial. The coefficients of the special characteristic polynomial can be based on computing the determinants of a Gram matrix of term (or part) probabilities, a method of recursion, or a method of recursion further weighted by the probability of document (or collection) lengths.
-
Citations
31 Claims
-
1. A method for generating a reduced rank matrix approximation for information retrieval, comprising:
-
forming a term-by-document matrix A, wherein the elements of the matrix A represent a plurality of terms within a plurality of documents, the documents related to a plurality of topics;
performing a singular value decomposition of the matrix A to identify a plurality of actual singular values each having a corresponding singular vector;
selecting a subset of the actual singular values based on singular values that correspond to at least one desired topic; and
determining a set of singular vectors based on the selected singular values, wherein the singular vectors provide an index for use during information retrieval. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A method for generating a reduced rank matrix approximation for information retrieval, comprising:
-
forming a part-by-collection matrix A, wherein the elements of the matrix represent the existence of one or more parts within one or more collections;
performing a singular value decomposition of the matrix A to identify a plurality of actual singular values each having a corresponding singular vector;
selecting a subset of the actual singular values based on singular values that correspond to at least one desired probabilistic distribution on parts; and
determining a set of singular vectors based on the selected singular values, wherein the singular vectors provide an index for use during information retrieval. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19, 20)
-
-
21. A method for reducing the rank of indexing matrix A for information retrieval, comprising:
-
determining the singular values of matrix A;
grouping the singular values of matrix A based on their correspondence to each of a plurality of topics; and
computing the reduced rank matrix based on the grouping of the singular values, wherein the reduced rank matrix provides an index for use during information retrieval. - View Dependent Claims (22, 23, 24, 25, 26, 27, 28, 29)
-
-
30. A method for identifying singular values to generate a reduced rank approximation of a term-by-document matrix for use in information retrieval, comprising the step of estimating a plurality of singular values of the matrix, the estimated singular values each corresponding to at least one topic.
-
31. A method for generating a reduced rank matrix approximation for information retrieval, comprising:
-
forming a term-by-document matrix A, wherein the elements of the matrix A represent a plurality of terms within a plurality of documents, each of the documents being related to at least one of plurality of topics;
performing a singular value decomposition of the matrix A to identify a plurality of actual singular values each having a corresponding singular vector;
estimating a plurality of estimated singular values each corresponding to at least one of the topics;
selecting at least one of the actual singular values based on the estimated singular values; and
computing a reduced rank matrix based on the selected singular values, wherein the reduced rank matrix provides an index for use during information retrieval.
-
Specification