class documentation
Return a stemmed Arabic word after removing affixes. This an improved version of the previous algorithm, which reduces under-stemming errors. Typically used in Arabic search engine, information retrieval and NLP.
>>> from nltk.stem import arlstem2 >>> stemmer = ARLSTem2() >>> word = stemmer.stem('يعمل') >>> print(word)
Parameters | |
token | The input Arabic word (unicode) to be stemmed |
Returns | |
A unicode Arabic word |
Method | __init__ |
Undocumented |
Method | adjective |
remove the infixes from adjectives |
Method | fem2masc |
transform the word from the feminine form to the masculine form. |
Method | norm |
normalize the word by removing diacritics, replace hamzated Alif with Alif bare, replace AlifMaqsura with Yaa and remove Waaw at the beginning. |
Method | plur2sing |
transform the word from the plural form to the singular form. |
Method | pref |
remove prefixes from the words' beginning. |
Method | stem |
Strip affixes from the token and return the stem. |
Method | stem1 |
call this function to get the first stem |
Method | suff |
remove the suffixes from the word's ending. |
Method | verb |
stem the verb prefixes and suffixes or both |
Method | verb |
stem the present tense co-occurred prefixes and suffixes |
Method | verb |
stem the future tense co-occurred prefixes and suffixes |
Method | verb |
stem the present tense suffixes |
Method | verb |
stem the present tense prefixes |
Method | verb |
stem the future tense prefixes |
Method | verb |
stem the imperative tense prefixes |
Instance Variable | is |
Undocumented |
Instance Variable | pl |
Undocumented |
Instance Variable | pl |
Undocumented |
Instance Variable | pr2 |
Undocumented |
Instance Variable | pr3 |
Undocumented |
Instance Variable | pr32 |
Undocumented |
Instance Variable | pr4 |
Undocumented |
Instance Variable | re_alif |
Undocumented |
Instance Variable | re |
Undocumented |
Instance Variable | re |
Undocumented |
Instance Variable | su2 |
Undocumented |
Instance Variable | su22 |
Undocumented |
Instance Variable | su3 |
Undocumented |
Instance Variable | su32 |
Undocumented |
Instance Variable | verb |
Undocumented |
Instance Variable | verb |
Undocumented |
Instance Variable | verb |
Undocumented |
Instance Variable | verb |
Undocumented |
Instance Variable | verb |
Undocumented |
Instance Variable | verb |
Undocumented |
Instance Variable | verb |
Undocumented |
normalize the word by removing diacritics, replace hamzated Alif with Alif bare, replace AlifMaqsura with Yaa and remove Waaw at the beginning.
overrides
nltk.stem.api.StemmerI.stem
Strip affixes from the token and return the stem.
Parameters | |
token:str | The token that should be stemmed. |