class documentation

Return a stemmed Arabic word after removing affixes. This an improved version of the previous algorithm, which reduces under-stemming errors. Typically used in Arabic search engine, information retrieval and NLP.

>>> from nltk.stem import arlstem2
>>> stemmer = ARLSTem2()
>>> word = stemmer.stem('يعمل')
>>> print(word)
Parameters
tokenThe input Arabic word (unicode) to be stemmed
Returns
A unicode Arabic word
Method __init__ Undocumented
Method adjective remove the infixes from adjectives
Method fem2masc transform the word from the feminine form to the masculine form.
Method norm normalize the word by removing diacritics, replace hamzated Alif with Alif bare, replace AlifMaqsura with Yaa and remove Waaw at the beginning.
Method plur2sing transform the word from the plural form to the singular form.
Method pref remove prefixes from the words' beginning.
Method stem Strip affixes from the token and return the stem.
Method stem1 call this function to get the first stem
Method suff remove the suffixes from the word's ending.
Method verb stem the verb prefixes and suffixes or both
Method verb_t1 stem the present tense co-occurred prefixes and suffixes
Method verb_t2 stem the future tense co-occurred prefixes and suffixes
Method verb_t3 stem the present tense suffixes
Method verb_t4 stem the present tense prefixes
Method verb_t5 stem the future tense prefixes
Method verb_t6 stem the imperative tense prefixes
Instance Variable is_verb Undocumented
Instance Variable pl_si2 Undocumented
Instance Variable pl_si3 Undocumented
Instance Variable pr2 Undocumented
Instance Variable pr3 Undocumented
Instance Variable pr32 Undocumented
Instance Variable pr4 Undocumented
Instance Variable re_alifMaqsura Undocumented
Instance Variable re_diacritics Undocumented
Instance Variable re_hamzated_alif Undocumented
Instance Variable su2 Undocumented
Instance Variable su22 Undocumented
Instance Variable su3 Undocumented
Instance Variable su32 Undocumented
Instance Variable verb_pr2 Undocumented
Instance Variable verb_pr22 Undocumented
Instance Variable verb_pr33 Undocumented
Instance Variable verb_su2 Undocumented
Instance Variable verb_suf1 Undocumented
Instance Variable verb_suf2 Undocumented
Instance Variable verb_suf3 Undocumented
def __init__(self): (source)

Undocumented

def adjective(self, token): (source)

remove the infixes from adjectives

def fem2masc(self, token): (source)

transform the word from the feminine form to the masculine form.

def norm(self, token): (source)

normalize the word by removing diacritics, replace hamzated Alif with Alif bare, replace AlifMaqsura with Yaa and remove Waaw at the beginning.

def plur2sing(self, token): (source)

transform the word from the plural form to the singular form.

def pref(self, token): (source)

remove prefixes from the words' beginning.

def stem(self, token): (source)

Strip affixes from the token and return the stem.

Parameters
token:strThe token that should be stemmed.
def stem1(self, token): (source)

call this function to get the first stem

def suff(self, token): (source)

remove the suffixes from the word's ending.

def verb(self, token): (source)

stem the verb prefixes and suffixes or both

def verb_t1(self, token): (source)

stem the present tense co-occurred prefixes and suffixes

def verb_t2(self, token): (source)

stem the future tense co-occurred prefixes and suffixes

def verb_t3(self, token): (source)

stem the present tense suffixes

def verb_t4(self, token): (source)

stem the present tense prefixes

def verb_t5(self, token): (source)

stem the future tense prefixes

def verb_t6(self, token): (source)

stem the imperative tense prefixes

is_verb: bool = (source)

Undocumented

pl_si2: list[str] = (source)

Undocumented

pl_si3: list[str] = (source)

Undocumented

pr2: list[str] = (source)

Undocumented

pr3: list[str] = (source)

Undocumented

pr32: list[str] = (source)

Undocumented

pr4: list[str] = (source)

Undocumented

re_alifMaqsura = (source)

Undocumented

re_diacritics = (source)

Undocumented

re_hamzated_alif = (source)

Undocumented

su2: list[str] = (source)

Undocumented

su22: list[str] = (source)

Undocumented

su3: list[str] = (source)

Undocumented

su32: list[str] = (source)

Undocumented

verb_pr2: list[str] = (source)

Undocumented

verb_pr22: list[str] = (source)

Undocumented

verb_pr33: list[str] = (source)

Undocumented

verb_su2: list[str] = (source)

Undocumented

verb_suf1: list[str] = (source)

Undocumented

verb_suf2: list[str] = (source)

Undocumented

verb_suf3: list[str] = (source)

Undocumented