Generalized latent semantic analysis
First Claim
1. A method for building an association tensor to facilitate document and word-level processing operations, comprising:
- receiving a collection of documents containing textual information;
using terms from the collection of documents to build the association tensor, which contains values representing pair-wise similarities between terms in the collection of documents; and
wherein if a given value for a pair-wise similarity is calculated based on an insufficient number of samples, the method further comprises, determining a corresponding value for the pair-wise similarity from a reference document collection, and substituting the corresponding value for the given value into the association tensor.
1 Assignment
0 Petitions
Accused Products
Abstract
One embodiment of the present invention provides a system that builds an association tensor (such as a matrix) to facilitate document and word-level processing operations. During operation, the system uses terms from a collection of documents to build an association tensor, which contains values representing pair-wise similarities between terms in the collection of documents. During this process, if a given value in the association tensor is calculated based on an insufficient number of samples, the system determines a corresponding value from a reference document collection, and then substitutes the corresponding value for the given value in the association tensor. After the association tensor is obtained, a dimensionality reduction method is applied to compute a low-dimensional vector space representation for the vocabulary terms. Document vectors are computed as linear combinations of term vectors.
47 Citations
24 Claims
-
1. A method for building an association tensor to facilitate document and word-level processing operations, comprising:
-
receiving a collection of documents containing textual information;
using terms from the collection of documents to build the association tensor, which contains values representing pair-wise similarities between terms in the collection of documents; and
wherein if a given value for a pair-wise similarity is calculated based on an insufficient number of samples, the method further comprises, determining a corresponding value for the pair-wise similarity from a reference document collection, and substituting the corresponding value for the given value into the association tensor. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8)
-
-
9. A computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for building an association tensor to facilitate document and word-level processing operations, the method comprising:
-
receiving a collection of documents containing textual information;
using terms from the collection of documents to build the association tensor, which contains values representing pair-wise similarities between terms in the collection of documents; and
wherein if a given value for a pair-wise similarity is calculated based on an insufficient number of samples, the method further comprises, determining a corresponding value for the pair-wise similarity from a reference document collection, and substituting the corresponding value for the given value into the association tensor. - View Dependent Claims (10, 11, 12, 13, 14, 15, 16)
-
-
17. An apparatus that builds an association tensor to facilitate document and word-level processing operations, comprising:
-
an receiving mechanism configured to receive a collection of documents containing textual information;
an tensor-building mechanism configured to use terms from the collection of documents to build the association tensor, which contains values representing pair-wise similarities between terms in the collection of documents; and
wherein if a given value for a pair-wise similarity is calculated based on an insufficient number of samples, the tensor-building mechanism is configured to, determine a corresponding value for the pair-wise similarity from a reference document collection, and to substitute the corresponding value for the given value into the association tensor. - View Dependent Claims (18, 19, 20, 21, 22, 23, 24)
-
Specification