class documentation

This subclass encapsulates two methods for defining the standard versions of the string regions R1, R2, and RV.

Method _r1r2_standard Return the standard interpretations of the string regions R1 and R2.
Method _rv_standard Return the standard interpretation of the string region RV.

Inherited from _LanguageSpecificStemmer:

Method __init__ Undocumented
Method __repr__ Print out the string representation of the respective class.
Instance Variable stopwords Undocumented

Inherited from StemmerI (via _LanguageSpecificStemmer):

Method stem Strip affixes from the token and return the stem.
def _r1r2_standard(self, word, vowels): (source)

Return the standard interpretations of the string regions R1 and R2.

R1 is the region after the first non-vowel following a vowel, or is the null region at the end of the word if there is no such non-vowel.

R2 is the region after the first non-vowel following a vowel in R1, or is the null region at the end of the word if there is no such non-vowel.

Parameters
word:str or unicodeThe word whose regions R1 and R2 are determined.
vowels:unicodeThe vowels of the respective language that are used to determine the regions R1 and R2.
Returns
tuple(r1,r2), the regions R1 and R2 for the respective word.
Notes
This helper method is invoked by the respective stem method of the subclasses DutchStemmer, FinnishStemmer, FrenchStemmer, GermanStemmer, ItalianStemmer, PortugueseStemmer, RomanianStemmer, and SpanishStemmer. It is not to be invoked directly!
A detailed description of how to define R1 and R2 can be found at http://snowball.tartarus.org/texts/r1r2.html
def _rv_standard(self, word, vowels): (source)

Return the standard interpretation of the string region RV.

If the second letter is a consonant, RV is the region after the next following vowel. If the first two letters are vowels, RV is the region after the next following consonant. Otherwise, RV is the region after the third letter.

Parameters
word:str or unicodeThe word whose region RV is determined.
vowels:unicodeThe vowels of the respective language that are used to determine the region RV.
Returns
unicodethe region RV for the respective word.
Note
This helper method is invoked by the respective stem method of the subclasses ItalianStemmer, PortugueseStemmer, RomanianStemmer, and SpanishStemmer. It is not to be invoked directly!