module documentation

Snowball stemmers

This module provides a port of the Snowball stemmers developed by Martin Porter.

There is also a demo function: snowball.demo().

Class ArabicStemmer https://github.com/snowballstem/snowball/blob/master/algorithms/arabic/stem_Unicode.sbl (Original Algorithm) The Snowball Arabic light Stemmer Algorithm : Assem Chelli
Class DanishStemmer The Danish Snowball stemmer.
Class DutchStemmer The Dutch Snowball stemmer.
Class EnglishStemmer The English Snowball stemmer.
Class FinnishStemmer The Finnish Snowball stemmer.
Class FrenchStemmer The French Snowball stemmer.
Class GermanStemmer The German Snowball stemmer.
Class HungarianStemmer The Hungarian Snowball stemmer.
Class ItalianStemmer The Italian Snowball stemmer.
Class NorwegianStemmer The Norwegian Snowball stemmer.
Class PorterStemmer A word stemmer based on the original Porter stemming algorithm.
Class PortugueseStemmer The Portuguese Snowball stemmer.
Class RomanianStemmer The Romanian Snowball stemmer.
Class RussianStemmer The Russian Snowball stemmer.
Class SnowballStemmer Snowball Stemmer
Class SpanishStemmer The Spanish Snowball stemmer.
Class SwedishStemmer The Swedish Snowball stemmer.
Function demo This function provides a demonstration of the Snowball stemmers.
Class _LanguageSpecificStemmer This helper subclass offers the possibility to invoke a specific stemmer directly. This is useful if you already know the language to be stemmed at runtime.
Class _ScandinavianStemmer This subclass encapsulates a method for defining the string region R1. It is used by the Danish, Norwegian, and Swedish stemmer.
Class _StandardStemmer This subclass encapsulates two methods for defining the standard versions of the string regions R1, R2, and RV.
def demo(): (source) ΒΆ

This function provides a demonstration of the Snowball stemmers.

After invoking this function and specifying a language, it stems an excerpt of the Universal Declaration of Human Rights (which is a part of the NLTK corpus collection) and then prints out the original and the stemmed text.