Technique for Information Retrieval Using Enhanced Latent Semantic Analysis
First Claim
1. A computer implemented method of information retrieval, comprising:
- parsing a corpus to identify a number of wordform instances within each document of the corpus;
generating a morpheme-by-document matrix based at least in part on the number of wordform instances within each document of the corpus, wherein the morpheme-by-document matrix separately enumerates instances of stems and affixes;
applying a weighting function to attribute-values within the morpheme-by-document matrix to generate a weighted morpheme-by-document matrix; and
generating at least one lower rank approximation matrix by factorizing the weighted morpheme-by-document matrix; and
retrieving information with reference to the at least one lower rank approximation matrix.
3 Assignments
0 Petitions
Accused Products
Abstract
A technique for information retrieval includes parsing a corpus to identify a number of wordform instances within each document of the corpus. A weighted morpheme-by-document matrix is generated based at least in part on the number of wordform instances within each document of the corpus and based at least in part on a weighting function. The weighted morpheme-by-document matrix separately enumerates instances of stems and affixes. Additionally or alternatively, a term-by-term alignment matrix may be generated based at least in part on the number of wordform instances within each document of the corpus. At least one lower rank approximation matrix is generated by factorizing the weighted morpheme-by-document matrix and/or the term-by-term alignment matrix.
78 Citations
26 Claims
-
1. A computer implemented method of information retrieval, comprising:
-
parsing a corpus to identify a number of wordform instances within each document of the corpus; generating a morpheme-by-document matrix based at least in part on the number of wordform instances within each document of the corpus, wherein the morpheme-by-document matrix separately enumerates instances of stems and affixes; applying a weighting function to attribute-values within the morpheme-by-document matrix to generate a weighted morpheme-by-document matrix; and generating at least one lower rank approximation matrix by factorizing the weighted morpheme-by-document matrix; and retrieving information with reference to the at least one lower rank approximation matrix. - View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10)
-
-
11. A computer implemented method of information retrieval, comprising:
-
parsing a corpus to identify a number of wordform instances within each document of the corpus; generating a term-by-term alignment matrix based at least in part on the number of wordform instances within each document of the corpus; and generating at least one lower rank approximation matrix by factorizing the term-by-term alignment matrix. - View Dependent Claims (12, 13, 14, 15, 16, 17, 18, 19)
-
-
20. A computer-readable storage medium that provides instructions that, when executed by a computer, will cause the computer to perform operations comprising:
-
parsing a corpus to identify a number of wordform instances within each document of the corpus; generating a weighted morpheme-by-document matrix based at least in part on the number of wordform instances within each document of the corpus and based at least in part on a weighting function, wherein the weighted morpheme-by-document matrix separately enumerates instances of stems and affixes; and generating at least one lower rank approximation matrix by factorizing the weighted morpheme-by-document matrix. - View Dependent Claims (21, 22, 23, 24, 25, 26)
-
Specification