nltk.stem.regexp.RegexpStemmer

class documentation

class RegexpStemmer(StemmerI): (source)

Constructor: RegexpStemmer(regexp, min)

A stemmer that uses regular expressions to identify morphological affixes. Any substrings that match the regular expressions will be removed.

>>> from nltk.stem import RegexpStemmer
>>> st = RegexpStemmer('ing$|s$|e$|able$', min=4)
>>> st.stem('cars')
'car'
>>> st.stem('mass')
'mas'
>>> st.stem('was')
'was'
>>> st.stem('bee')
'bee'
>>> st.stem('compute')
'comput'
>>> st.stem('advisable')
'advis'

Parameters
regexp	The regular expression that should be used to identify morphological affixes.
min	The minimum length of string to stem

Method	`__init__`	Undocumented
Method	`__repr__`	Undocumented
Method	`stem`	Strip affixes from the token and return the stem.
Instance Variable	`_min`	Undocumented
Instance Variable	`_regexp`	Undocumented

def __init__(self, regexp, min=0): (source) ¶

Undocumented

def __repr__(self): (source) ¶

Undocumented

def stem(self, word): (source) ¶

overrides nltk.stem.api.StemmerI.stem

Strip affixes from the token and return the stem.

Parameters
word	Undocumented
token:str	The token that should be stemmed.

_min = (source) ¶

Undocumented

_regexp = (source) ¶

Undocumented