Lemmatizing, stemming, and query expansion method and system
First Claim
1. A computer-implemented method of stemming Arabic text comprising:
- removing stop words from a document comprising Arabic text based on at least one stop word entry in an array of stop words;
flagging as nouns words determined to be attached to definite articles and preceded by a noun array entry in an array of stop words preceding at least one noun;
adding flagged nouns to a noun dictionary;
flagging as verbs words determined to be preceded by a verb array entry in an array of stop words preceding at least one verb;
adding flagged verbs to a verb dictionary;
searching the document for nouns and verbs based on the flagged nouns and the flagged verbs;
removing remaining stop words subsequent to searching the document;
applying light stemming on the flagged nouns only;
Applying a root-based stemming on the flagged verbs only; and
storing the stemmed document.
0 Assignments
0 Petitions
Accused Products
Abstract
A method of stemming text and system therefore are described. The method comprises removing stop words from a document based on at least one stop word entry in an array of stop words and flagging as nouns words determined to be attached to definite articles and preceded by a noun array entry in an array of stop words preceding at least one noun; adding flagged nouns to a noun dictionary; flagging as verbs words determined to be preceded by an verb array entry in an array of stop words preceding at least one verb; adding flagged verbs to a verb dictionary; searching the document for nouns and verbs based on the flagged nouns and the flagged verbs; removing remaining stop words subsequent to searching the document; applying light stemming on the flagged nouns; applying a root-based stemming on the flagged verbs; and storing the stemmed document.
-
Citations
7 Claims
-
1. A computer-implemented method of stemming Arabic text comprising:
-
removing stop words from a document comprising Arabic text based on at least one stop word entry in an array of stop words; flagging as nouns words determined to be attached to definite articles and preceded by a noun array entry in an array of stop words preceding at least one noun; adding flagged nouns to a noun dictionary; flagging as verbs words determined to be preceded by a verb array entry in an array of stop words preceding at least one verb; adding flagged verbs to a verb dictionary; searching the document for nouns and verbs based on the flagged nouns and the flagged verbs; removing remaining stop words subsequent to searching the document; applying light stemming on the flagged nouns only; Applying a root-based stemming on the flagged verbs only; and storing the stemmed document. - View Dependent Claims (2, 3)
-
-
4. A system for stemming Arabic text comprising:
- a processor;
a memory, storing an input document comprising Arabic text, a stemming process, a noun stop word array of stop words preceding at least one noun, and a verb stop word array of stop words preceding at least one verb, and coupled with the processor, wherein the stemming process comprises a sequence of one or more instructions which, when executed by the processor, causes the processor to;
remove stop words from the input document based on at least one noun stop word entry in the noun stop word array;
flag as nouns words determined to be attached to definite articles and preceded by a noun stop word array entry;
flag as verbs words determined to be preceded by a verb stop word array entry;
search the document for nouns and verbs based on the flagged nouns and the flagged verbs;
remove remaining stop words subsequent to searching the document to create a stemmed document;
apply light stemming on the flagged nouns only;
apply a root-based stemming on the flagged verbs only; and
store the stemmed document to the memory. - View Dependent Claims (5, 6)
- a processor;
-
7. A non-transitory computer-readable medium storing instruction which, when executed by a processor, cause the processor to perform a method comprising:
-
removing stop words from a document comprising Arabic text based on at least one stop word entry in an array of stop words; flagging as nouns words determined to be attached to definite articles and preceded by a noun array entry in an array of stop words preceding at least one noun; adding flagged nouns to a noun dictionary; flagging as verbs words determined to be preceded by a verb array entry in an array of stop words preceding at least one verb; adding flagged verbs to a verb dictionary; searching the document for nouns and verbs based on the flagged nouns and the flagged verbs; removing remaining stop words subsequent to searching the document; applying light stemming on the flagged nouns only; applying a root-based stemming on the flagged verbs only; and storing the stemmed document.
-
Specification