Selective latent semantic indexing method for information retrieval applications

US 7,630,992 B2
Filed: 08/17/2006
Issued: 12/08/2009
Est. Priority Date: 11/30/2005
Status: Expired due to Fees

First Claim

Patent Images

1. A method for generating a reduced rank approximation for information retrieval, comprising:

forming on a computer a term-by-document matrix A, wherein the elements of the matrix A represent a plurality of terms within a plurality of documents, the documents related to a plurality of topics;

estimating via the computer a plurality of singular values corresponding to at least one of the topics;

identifying via the computer a plurality of actual singular values each having a corresponding singular vector associated with the matrix A;

selecting via the computer a subset of the actual singular values based on actual singular values that correspond to at least one of the estimated singular values; and

determining via the computer a set of singular vectors based on the selected singular values, wherein the singular vectors provide an index for use during information retrieval.

View all claims

1 Assignment

Timeline View

Assignment View

0 Petitions

Accused Products

Abstract

A term-by-document (or part-by-collection) matrix can be used to index documents (or collections) for information retrieval applications. Reducing the rank of the indexing matrix can further reduce the complexity of information retrieval. A method for index matrix rank reduction can involve computing a singular value decomposition and then retaining singular values based on the singular values corresponding to singular values of multiple topics. The expected singular values corresponding to a topic can be determined using the roots of a specially formed characteristic polynomial. The coefficients of the special characteristic polynomial can be based on computing the determinants of a Gram matrix of term (or part) probabilities, a method of recursion, or a method of recursion further weighted by the probability of document (or collection) lengths.

19 Citations

View as Search Results

29 Claims

1. A method for generating a reduced rank approximation for information retrieval, comprising:
- forming on a computer a term-by-document matrix A, wherein the elements of the matrix A represent a plurality of terms within a plurality of documents, the documents related to a plurality of topics;
  
  estimating via the computer a plurality of singular values corresponding to at least one of the topics;
  
  identifying via the computer a plurality of actual singular values each having a corresponding singular vector associated with the matrix A;
  
  selecting via the computer a subset of the actual singular values based on actual singular values that correspond to at least one of the estimated singular values; and
  
  determining via the computer a set of singular vectors based on the selected singular values, wherein the singular vectors provide an index for use during information retrieval.
- View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9)
- - 2. The method of claim 1, further comprising the step of computing a reduced rank approximation based on the selected singular values and the selected singular vectors, the reduced rank approximation providing an index for use during information retrieval.
  - 3. The method of claim 1, further comprising the step of identifying a probabilistic matrix model, the model comprising at least one of a plurality of probabilities representing the number of occurrences of the terms within the documents.
  - 4. The method of claim 1, wherein the selecting step further comprises matching the estimated singular values corresponding to one or more topics with the actual singular values of matrix A.
  - 5. The method of claim 1, wherein the selecting step further comprises selecting a plurality of actual singular values of matrix A that each correspond to at least one of the estimated singular values, the actual singular values corresponding to at least one of the topics.
  - 6. The method of claim 1, wherein the step of estimating singular values corresponding to a topic comprises the steps of:
    - generating characteristic coefficients;
      
      forming a special characteristic polynomial based on the characteristic coefficients; and
      
      solving for the roots of the characteristic polynomial, wherein multiplying the roots by the number of documents related to the topic and then taking a square-root yields the estimated singular values.
  - 7. The method of claim 6, wherein the step of generating characteristic coefficients comprises, for each coefficient:
    - forming a vector representing probabilities of the terms in the documents;
      
      forming a matrix B with copies of the vector as its columns; and
      
      computing the determinant of the Gram matrix given by |B^TB| to generate each coefficient.
  - 8. The method of claim 6, wherein the step of generating characteristic coefficients comprises computing a recursion such that each coefficient is based on those coefficients already computed.
  - 9. The method of claim 8, wherein the step of generating characteristic coefficients further comprises computing a probabilistically weighted average of coefficients based on the probability of document length, the weighted averaging allowing the method to function with documents of non-uniform lengths.

10. A method for generating a reduced rank approximation for information retrieval, comprising:
- forming on a computer a part-by-collection matrix A, wherein the elements of the matrix represent the existence of one or more parts within one or more collections;
  
  estimating via the computer a plurality of singular values corresponding to at least one of a plurality of probabilistic distributions on parts;
  
  identifying via the computer a plurality of actual singular values each having a corresponding singular vector;
  
  selecting via the computer a subset of the actual singular values based on actual singular values that correspond to at least one of the estimated singular values; and
  
  determining via the computer a set of singular vectors based on the selected singular values, wherein the singular vectors provide an index for use during information retrieval.
- View Dependent Claims (11, 12, 13, 14, 15, 16, 17, 18)
- - 11. The method of claim 10, further comprising the step of computing a reduced rank approximation based on the selected singular values and the selected singular vectors, the reduced rank approximation providing an index for use during information retrieval.
  - 12. The method of claim 10, further comprising the step of identifying a probabilistic matrix model, the model comprising one or more probabilities representing the number of occurrences of one or more parts within one or more collections, the collections relating to a probabilistic distribution on parts.
  - 13. The method of claim 10, wherein the selecting step comprises matching the estimated singular values corresponding to at least one of a plurality of probabilistic distributions on parts with the actual singular values of matrix A.
  - 14. The method of claim 11, wherein the selecting step comprises selecting at least one of the actual singular values of matrix A that substantially equal at least one of the estimated singular values corresponding to at least one of the probabilistic distributions on parts.
  - 15. The method of claim 11, wherein the step of estimating singular values corresponding to a probabilistic distribution on parts comprises the steps of:
    - generating characteristic coefficients;
      
      forming a characteristic polynomial; and
      
      solving for the roots of the characteristic polynomial, wherein multiplying the roots by the number of collections related to the probabilistic distribution on parts and then taking a square-root yields the estimated singular values.
  - 16. The method of claim 15, wherein the step of generating characteristic coefficients comprises, for each coefficient:
    - forming a vector representing probabilities of the parts;
      
      forming a matrix B with copies of the vector as its columns; and
      
      computing the determinant of the Gram matrix given by |B^TB| to generate each coefficient.
  - 17. The method of claim 16, wherein the step of generating characteristic coefficients comprises computing a recursion such that each coefficient is based on those coefficients already computed.
  - 18. The method of claim 17, wherein the step of generating characteristic coefficients further comprises computing a probabilistically weighted average of coefficients based on the probability of collection size, the weighted averaging allowing the method to function with collections of non-uniform sizes.

19. A method for reducing the rank of a matrix A for information retrieval, comprising:
- determining via a computer a plurality of probabilistic distributions on parts;
  
  estimating via the computer a plurality of singular values corresponding to at least one of a plurality of probabilistic distributions on parts;
  
  identifying via the computer a plurality of actual singular values each having a corresponding singular vector associated with the matrix A;
  
  grouping via the computer the singular values of matrix A based on the singular values of matrix A that correspond to at least one of the estimated singular values; and
  
  computing via the computer the reduced rank approximation based on the grouping of the singular values, wherein the reduced rank approximation provides an index for use during information retrieval.
- View Dependent Claims (20, 21, 22, 23, 24, 25, 26, 27)
- - 20. The method of claim 19, wherein the grouping step comprises estimating singular values corresponding to at least one of the probabilistic distributions on parts and selecting actual singular values of matrix A that correspond to the estimated singular values.
  - 21. The method of claim 20, wherein the selecting step comprises matching the estimated singular values corresponding to at least one of the probabilistic distributions on parts with the actual singular values of matrix A.
  - 22. The method of claim 20, wherein the step of estimating singular values corresponding to a probabilistic distributions on parts comprises the steps of:
    - generating characteristic coefficients;
      
      forming a special characteristic polynomial; and
      
      solving for the roots of the characteristic polynomial, wherein multiplying the roots by the number of samples related to corresponding probabilistic distributions on parts and then taking a square-root yields the estimated singular values.
  - 23. The method of claim 22, wherein the step of generating characteristic coefficients comprises, for each coefficient:
    - forming a vector representing probabilities of the parts;
      
      forming a matrix B with copies of the vector as its columns; and
      
      computing the determinant of the Gram matrix given by |B^TB| to generate each coefficient.
  - 24. The method of claim 22, wherein the step of generating characteristic coefficients comprises computing a recursion such that each subsequent coefficient is based on at least one coefficient already computed.
  - 25. The method of claim 24, wherein the step of generating characteristic coefficients further comprises computing a probabilistically weighted average of coefficients based on the probability of sample length, the weighted averaging allowing the method to function with samples of non-uniform lengths.
  - 26. The method of claim 21, wherein the identifying step comprises performing a singular value decomposition of the matrix A to identify a plurality of actual singular values each having a corresponding singular vector.
  - 27. The method of claim 19, wherein the computing step further comprises selecting at least one of the singular values grouped to correspond to a probabilistic distributions on parts, wherein the selecting step selects the singular values to be computed into the reduced rank approximation to maintain or remove the indexing of the probabilistic distributions on parts for use in information retrieval.

28. A method for identifying on a computer singular values to generate a reduced rank approximation of a term-by-document matrix for use in information retrieval, comprising the steps of estimating via a computer a plurality of singular values of the matrix, and generating via the computer the reduced rank matrix based on the estimated singular values.

29. A method for generating a reduced rank approximation for information retrieval, comprising:
- forming on a computer a term-by-document matrix A, wherein the elements of the matrix A represent a plurality of terms within a plurality of documents, each of the documents being related to at least one of plurality of topics;
  
  identifying via the computer a plurality of actual singular values each having a corresponding singular vector;
  
  estimating via the computer a plurality of estimated singular values each corresponding to at least one of the topics;
  
  selecting via the computer at least one of the actual singular values based on the estimated singular values; and
  
  generating via the computer a reduced rank approximation based on the selected singular values, wherein the reduced rank approximation provides an index for use during information retrieval.

Specification

Resources

Litigation Campaign Assessment

Current Assignee
Selective, Inc.
Original Assignee
Selective, Inc.
Inventors
Canfield, Earl Rodney, Martin, Jacob Gilmore
Primary Examiner(s)
Trujillo; James
Assistant Examiner(s)
CONYERS, DAWAUNE A

Application Number

US11/505,654
Publication Number

US 20070124299A1
Time in Patent Office

1,209 Days
Field of Search

None
US Class Current

1/1
CPC Class Codes

G06F 16/31 Indexing; Data structures t...

Selective latent semantic indexing method for information retrieval applications

First Claim

1 Assignment

0 Petitions

Accused Products

Abstract

19 Citations

29 Claims

Specification

Solutions

Use Cases

Quick Links

Selective latent semantic indexing method for information retrieval applications

First Claim

1 Assignment

Subscription Required

Subscription Required

0 Petitions

Subscription Required

Accused Products

Subscription Required

Abstract

19 Citations

29 Claims

Specification

Subscription Required

Solutions

Use Cases

Quick Links