Generalized latent semantic analysis
First Claim
1. A computer-implemented method for building an association tensor to facilitate document and word-level processing operations, the method comprising:
- receiving, at a computer, a collection of reference documents containing textual information;
building, at the computer, the association tensor representing pair-wise similarities corresponding to a co-occurrence of term pairs in the collection of reference documents;
computing a lower-dimensional vector-space representation of terms that preserves the pair-wise similarities in the association tensor based on a singular value decomposition of the association tensor;
deriving a lower-dimensional vector-space representation of a document based on a weighted linear combination of computed term vectors in the lower-dimensional vector-space representation of the terms in the collection of reference documents; and
performing semantic analysis upon the collection of reference documents using the lower-dimensional vector-space representation of the document.
1 Assignment
0 Petitions
Accused Products
Abstract
One embodiment of the present invention provides a system that builds an association tensor (such as a matrix) to facilitate document and word-level processing operations. During operation, the system uses terms from a collection of documents to build an association tensor, which contains values representing pair-wise similarities between terms in the collection of documents. During this process, if a given value in the association tensor is calculated based on an insufficient number of samples, the system determines a corresponding value from a reference document collection, and then substitutes the corresponding value for the given value in the association tensor. After the association tensor is obtained, a dimensionality reduction method is applied to compute a low-dimensional vector space representation for the vocabulary terms. Document vectors are computed as linear combinations of term vectors.
29 Citations
12 Claims
-
1. A computer-implemented method for building an association tensor to facilitate document and word-level processing operations, the method comprising:
-
receiving, at a computer, a collection of reference documents containing textual information; building, at the computer, the association tensor representing pair-wise similarities corresponding to a co-occurrence of term pairs in the collection of reference documents; computing a lower-dimensional vector-space representation of terms that preserves the pair-wise similarities in the association tensor based on a singular value decomposition of the association tensor; deriving a lower-dimensional vector-space representation of a document based on a weighted linear combination of computed term vectors in the lower-dimensional vector-space representation of the terms in the collection of reference documents; and performing semantic analysis upon the collection of reference documents using the lower-dimensional vector-space representation of the document. - View Dependent Claims (2, 3, 4)
-
-
5. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method for building an association tensor to facilitate document and word-level processing operations, the method comprising:
-
receiving a collection of reference documents containing textual information; building the association tensor representing pair-wise similarities corresponding to a co-occurrence of term pairs in the collection of reference documents; computing a lower-dimensional vector-space representation of terms that preserves the pair-wise similarities in the association tensor based on a singular value decomposition of the association tensor; deriving a lower-dimensional vector-space representation of a document based on a weighted linear combination of computed term vectors in the lower-dimensional vector-space representation of the terms in the collection of reference documents; and performing semantic analysis upon the collection of reference documents using the lower-dimensional vector-space representation of the document. - View Dependent Claims (6, 7, 8)
-
-
9. An apparatus, implemented on a computer system, that builds an association tensor to facilitate document and word-level processing operations, comprising:
-
a processor; a memory; a receiving mechanism configured to receive a collection of reference documents containing textual information; an association-tensor-building mechanism configured to build the association tensor representing pair-wise similarities corresponding to a co-occurrence of term pairs in the collection of reference documents; a lower-dimensionality-association-tensor computing mechanism configured to compute a lower-dimensional vector-space representation of terms that preserves the pair-wise similarities in the association tensor based on a singular value decomposition of the association tensor; a lower-dimensionality-matrix-deriving mechanism configured to derive a lower-dimensional vector-space representation of a document based on a weighted linear combination of computed term vectors in the lower-dimensional vector-space representation of the terms in the collection of reference documents; and a performing mechanism configured to perform semantic analysis upon the collection of reference documents using the lower-dimensional vector-space representation of the document. - View Dependent Claims (10, 11, 12)
-
Specification