Multi-Concept Latent Semantic Analysis Queries
First Claim
1. A system, comprising:
- one or more memory units; and
one or more processing units operable to;
access text;
identify a plurality of terms from the text;
determine a plurality of term vectors associated with the identified plurality of terms;
cluster the determined plurality of term vectors into a plurality of clusters, the plurality of clusters comprising a first and a second cluster, the first and second clusters each comprising two or more of the determined term vectors;
create a first pseudo-document according to the first cluster;
create a second pseudo-document according to the second cluster;
identify, using latent semantic analysis (LSA) of the first pseudo-document, a first set of terms associated with the first cluster;
identify, using LSA of the second pseudo-document, a second set of terms associated with the second cluster; and
combine the first and second sets of terms into a list of output terms.
7 Assignments
0 Petitions
Accused Products
Abstract
A method includes accessing text, identifying a plurality of terms from the text, determining a plurality of term vectors associated with the identified plurality of terms, and clustering the determined plurality of term vectors into a plurality of clusters, the plurality of clusters comprising a first and a second cluster, the first and second clusters each comprising two or more of the determined term vectors. The method further includes creating a first pseudo-document according to the first cluster, creating a second pseudo-document according to the second cluster, identifying a first set of terms associated with the first cluster using latent semantic analysis (LSA) of the first pseudo-document, identifying a second set of terms associated with the second cluster using LSA of the second pseudo-document, and combining the first and second sets of terms into a list of output terms.
-
Citations
20 Claims
-
1. A system, comprising:
-
one or more memory units; and one or more processing units operable to; access text; identify a plurality of terms from the text; determine a plurality of term vectors associated with the identified plurality of terms; cluster the determined plurality of term vectors into a plurality of clusters, the plurality of clusters comprising a first and a second cluster, the first and second clusters each comprising two or more of the determined term vectors; create a first pseudo-document according to the first cluster; create a second pseudo-document according to the second cluster; identify, using latent semantic analysis (LSA) of the first pseudo-document, a first set of terms associated with the first cluster; identify, using LSA of the second pseudo-document, a second set of terms associated with the second cluster; and combine the first and second sets of terms into a list of output terms. - View Dependent Claims (2, 3, 4, 5, 6, 7)
-
-
8. A computer-implemented method, comprising:
-
accessing text by a processing system; identifying, by the processing system, one or more terms from the text; determining, by the processing system, one or more term vectors associated with the identified terms; clustering, by the processing system, the determined term vectors into one or more clusters, each cluster comprising at least one of the determined term vectors; creating, by the processing system, one or more pseudo-documents, each pseudo-document created according to a particular one of the one or more clusters; identifying, by the processing system using latent semantic analysis (LSA), a set of terms for each of the created pseudo-documents; and creating, by the processing system, a list of output terms using the identified sets of terms. - View Dependent Claims (9, 10, 11, 12, 13, 14)
-
-
15. A non-transitory computer-readable medium comprising software, the software when executed by one or more processing units operable to perform operations comprising:
-
accessing text; identifying a plurality of terms from the text; determining a plurality of term vectors associated with the identified plurality of terms; clustering the determined plurality of term vectors into a plurality of clusters, the plurality of clusters comprising a first and a second cluster, the first and second clusters each comprising two or more of the determined term vectors; creating a first pseudo-document according to the first cluster; creating a second pseudo-document according to the second cluster; identifying, using latent semantic analysis (LSA) of the first pseudo-document, a first set of terms associated with the first cluster; identifying, using LSA of the second pseudo-document, a second set of terms associated with the second cluster; and combining the first and second sets of terms into a list of output terms. - View Dependent Claims (16, 17, 18, 19, 20)
-
Specification