class documentation
A stemmer that uses regular expressions to identify morphological affixes. Any substrings that match the regular expressions will be removed.
>>> from nltk.stem import RegexpStemmer >>> st = RegexpStemmer('ing$|s$|e$|able$', min=4) >>> st.stem('cars') 'car' >>> st.stem('mass') 'mas' >>> st.stem('was') 'was' >>> st.stem('bee') 'bee' >>> st.stem('compute') 'comput' >>> st.stem('advisable') 'advis'
| Parameters | |
| regexp | The regular expression that should be used to identify morphological affixes. |
| min | The minimum length of string to stem |
| Method | __init__ |
Undocumented |
| Method | __repr__ |
Undocumented |
| Method | stem |
Strip affixes from the token and return the stem. |
| Instance Variable | _min |
Undocumented |
| Instance Variable | _regexp |
Undocumented |
overrides
nltk.stem.api.StemmerI.stemStrip affixes from the token and return the stem.
| Parameters | |
| word | Undocumented |
| token:str | The token that should be stemmed. |