class documentation

Lancaster Stemmer

>>> from nltk.stem.lancaster import LancasterStemmer
>>> st = LancasterStemmer()
>>> st.stem('maximum')     # Remove "-um" when word is intact
'maxim'
>>> st.stem('presumably')  # Don't remove "-um" when word is not intact
'presum'
>>> st.stem('multiply')    # No action taken if word ends with "-ply"
'multiply'
>>> st.stem('provision')   # Replace "-sion" with "-j" to trigger "j" set of rules
'provid'
>>> st.stem('owed')        # Word starting with vowel must contain at least 2 letters
'ow'
>>> st.stem('ear')         # ditto
'ear'
>>> st.stem('saying')      # Words starting with consonant must contain at least 3
'say'
>>> st.stem('crying')      #     letters and one of those letters must be a vowel
'cry'
>>> st.stem('string')      # ditto
'string'
>>> st.stem('meant')       # ditto
'meant'
>>> st.stem('cement')      # ditto
'cem'
>>> st_pre = LancasterStemmer(strip_prefix_flag=True)
>>> st_pre.stem('kilometer') # Test Prefix
'met'
>>> st_custom = LancasterStemmer(rule_tuple=("ssen4>", "s1t."))
>>> st_custom.stem("ness") # Change s to t
'nest'
Method __init__ Create an instance of the Lancaster stemmer.
Method __repr__ Undocumented
Method parseRules Validate the set of rules used in this stemmer.
Method stem Stem a word using the Lancaster stemmer.
Class Variable default_rule_tuple Undocumented
Instance Variable rule_dictionary Undocumented
Method __applyRule Apply the stemming rule to the word
Method __doStemming Perform the actual word stemming
Method __getLastLetter Get the zero-based index of the last alphabetic character in this string
Method __isAcceptable Determine if the word is acceptable for stemming.
Method __stripPrefix Remove prefix from a word.
Instance Variable _rule_tuple Undocumented
Instance Variable _strip_prefix Undocumented
def __init__(self, rule_tuple=None, strip_prefix_flag=False): (source)

Create an instance of the Lancaster stemmer.

def __repr__(self): (source)

Undocumented

def parseRules(self, rule_tuple=None): (source)

Validate the set of rules used in this stemmer.

If this function is called as an individual method, without using stem method, rule_tuple argument will be compiled into self.rule_dictionary. If this function is called within stem, self._rule_tuple will be used.

def stem(self, word): (source)

Stem a word using the Lancaster stemmer.

default_rule_tuple: tuple[str, ...] = (source)

Undocumented

rule_dictionary: dict = (source)

Undocumented

def __applyRule(self, word, remove_total, append_string): (source)

Apply the stemming rule to the word

def __doStemming(self, word, intact_word): (source)

Perform the actual word stemming

def __getLastLetter(self, word): (source)

Get the zero-based index of the last alphabetic character in this string

def __isAcceptable(self, word, remove_total): (source)

Determine if the word is acceptable for stemming.

def __stripPrefix(self, word): (source)

Remove prefix from a word.

This function originally taken from Whoosh.

_rule_tuple = (source)

Undocumented

_strip_prefix = (source)

Undocumented