nltk.stem.snowball.HungarianStemmer

class documentation

class HungarianStemmer(_LanguageSpecificStemmer): (source)

Constructor: HungarianStemmer(ignore_stopwords)

The Hungarian Snowball stemmer.

Note
A detailed description of the Hungarian stemming algorithm can be found under http://snowball.tartarus.org/algorithms/hungarian/stemmer.html

Method	`stem`	Stem an Hungarian word and return the stemmed form.
Method	`__r1_hungarian`	Return the region R1 that is used by the Hungarian stemmer.
Class Variable	`__digraphs`	The Hungarian digraphs.
Class Variable	`__double_consonants`	The Hungarian double consonants.
Class Variable	`__step1_suffixes`	Suffixes to be deleted in step 1 of the algorithm.
Class Variable	`__step2_suffixes`	Suffixes to be deleted in step 2 of the algorithm.
Class Variable	`__step3_suffixes`	Suffixes to be deleted in step 3 of the algorithm.
Class Variable	`__step4_suffixes`	Suffixes to be deleted in step 4 of the algorithm.
Class Variable	`__step5_suffixes`	Suffixes to be deleted in step 5 of the algorithm.
Class Variable	`__step6_suffixes`	Suffixes to be deleted in step 6 of the algorithm.
Class Variable	`__step7_suffixes`	Suffixes to be deleted in step 7 of the algorithm.
Class Variable	`__step8_suffixes`	Suffixes to be deleted in step 8 of the algorithm.
Class Variable	`__step9_suffixes`	Suffixes to be deleted in step 9 of the algorithm.
Class Variable	`__vowels`	The Hungarian vowels.

Inherited from _LanguageSpecificStemmer:

Method	`__init__`	Undocumented
Method	`__repr__`	Print out the string representation of the respective class.
Instance Variable	`stopwords`	Undocumented

def stem(self, word): (source) ¶

overrides nltk.stem.api.StemmerI.stem

Stem an Hungarian word and return the stemmed form.

Parameters
word:str or unicode	The word that is stemmed.
Returns
unicode	The stemmed form.

def __r1_hungarian(self, word, vowels, digraphs): (source) ¶

Return the region R1 that is used by the Hungarian stemmer.

If the word begins with a vowel, R1 is defined as the region after the first consonant or digraph (= two letters stand for one phoneme) in the word. If the word begins with a consonant, it is defined as the region after the first vowel in the word. If the word does not contain both a vowel and consonant, R1 is the null region at the end of the word.

Parameters
word:str or unicode	The Hungarian word whose region R1 is determined.
vowels:unicode	The Hungarian vowels that are used to determine the region R1.
digraphs:tuple	The digraphs that are used to determine the region R1.
Returns
unicode	the region R1 for the respective word.
Note
This helper method is invoked by the stem method of the subclass HungarianStemmer. It is not to be invoked directly!