nltk.stem.snowball.SnowballStemmer

class documentation

class SnowballStemmer(StemmerI): (source)

Constructor: SnowballStemmer(language, ignore_stopwords)

Snowball Stemmer

The following languages are supported: Arabic, Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish and Swedish.

The algorithm for English is documented here:

Porter, M. "An algorithm for suffix stripping." Program 14.3 (1980): 130-137.

The algorithms have been developed by Martin Porter. These stemmers are called Snowball, because Porter created a programming language with this name for creating new stemming algorithms. There is more information available at http://snowball.tartarus.org/

The stemmer is invoked as shown below:

>>> from nltk.stem import SnowballStemmer
>>> print(" ".join(SnowballStemmer.languages)) # See which languages are supported
arabic danish dutch english finnish french german hungarian
italian norwegian porter portuguese romanian russian
spanish swedish
>>> stemmer = SnowballStemmer("german") # Choose a language
>>> stemmer.stem("Autobahnen") # Stem a word
'autobahn'

Invoking the stemmers that way is useful if you do not know the language to be stemmed at runtime. Alternatively, if you already know the language, then you can invoke the language specific stemmer directly:

>>> from nltk.stem.snowball import GermanStemmer
>>> stemmer = GermanStemmer()
>>> stemmer.stem("Autobahnen")
'autobahn'

Parameters
language	The language whose subclass is instantiated.
ignore_stopwords	If set to True, stopwords are not stemmed and returned unchanged. Set to False by default.
Raises
`ValueError`	If there is no stemmer for the specified language, a ValueError is raised.

Method	`__init__`	Undocumented
Method	`stem`	Strip affixes from the token and return the stem.
Class Variable	`languages`	Undocumented
Instance Variable	`stemmer`	Undocumented
Instance Variable	`stopwords`	Undocumented

def __init__(self, language, ignore_stopwords=False): (source) ¶

Undocumented

def stem(self, token): (source) ¶

overrides nltk.stem.api.StemmerI.stem

Strip affixes from the token and return the stem.

Parameters
token:str	The token that should be stemmed.

languages: tuple[str, ...] = (source) ¶

Undocumented

stemmer = (source) ¶

Undocumented

stopwords = (source) ¶

Undocumented