×

Technique for information retrieval using enhanced latent semantic analysis generating rank approximation matrix by factorizing the weighted morpheme-by-document matrix

  • US 8,290,961 B2
  • Filed: 01/13/2009
  • Issued: 10/16/2012
  • Est. Priority Date: 01/13/2009
  • Status: Active Grant
First Claim
Patent Images

1. A computer implemented method of information retrieval, comprising:

  • parsing a corpus to identify instances of wordforms within each document of the corpus;

    performing a morphological tokenization on the wordforms to convert the wordforms into morphemes by separating affixes from their stems;

    generating a morpheme-by-document matrix based at least in part on a number of instances of the affixes and the stems within each document of the corpus, wherein the morpheme-by-document matrix accounts for information related to past tense and present tense inflections by separately enumerating the instances of the affixes from the instances of the stems;

    applying a weighting function to attribute-values within the morpheme-by-document matrix to generate a weighted morpheme-by-document matrix, wherein applying the weighting function includes;

    applying the weighting function to the attribute-values of the stems; and

    applying the weighting function to the attribute-values of the affixes separately from the attribute-values of the stems to separately account for relative importance of the affixes;

    generating at least one lower rank approximation matrix by factorizing the weighted morpheme-by-document matrix; and

    retrieving information with reference to the at least one lower rank approximation matrix.

View all claims
  • 3 Assignments
Timeline View
Assignment View
    ×
    ×