×

Lemmatizing, stemming, and query expansion method and system

  • US 8,473,279 B2
  • Filed: 06/01/2009
  • Issued: 06/25/2013
  • Est. Priority Date: 05/30/2008
  • Status: Active Grant
First Claim
Patent Images

1. A computer-implemented method of stemming Arabic text comprising:

  • removing stop words from a document comprising Arabic text based on at least one stop word entry in an array of stop words;

    flagging as nouns words determined to be attached to definite articles and preceded by a noun array entry in an array of stop words preceding at least one noun;

    adding flagged nouns to a noun dictionary;

    flagging as verbs words determined to be preceded by a verb array entry in an array of stop words preceding at least one verb;

    adding flagged verbs to a verb dictionary;

    searching the document for nouns and verbs based on the flagged nouns and the flagged verbs;

    removing remaining stop words subsequent to searching the document;

    applying light stemming on the flagged nouns only;

    Applying a root-based stemming on the flagged verbs only; and

    storing the stemmed document.

View all claims
  • 0 Assignments
Timeline View
Assignment View
    ×
    ×