class documentation
class StringTokenizer(TokenizerI): (source)
Known subclasses: nltk.tokenize.simple.CharTokenizer, nltk.tokenize.simple.SpaceTokenizer, nltk.tokenize.simple.TabTokenizer
A tokenizer that divides a string into substrings by splitting on the specified string (defined in subclasses).
| Method | span |
Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token. |
| Method | tokenize |
Return a tokenized copy of s. |
| Property | _string |
Undocumented |
Inherited from TokenizerI:
| Method | span |
Apply self.span_tokenize() to each element of strings. I.e.: |
| Method | tokenize |
Apply self.tokenize() to each element of strings. I.e.: |
overridden in
nltk.tokenize.simple.CharTokenizerIdentify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token.
| Returns | |
| iter(tuple(int, int)) | Undocumented |
overrides
nltk.tokenize.api.TokenizerI.tokenizeoverridden in
nltk.tokenize.simple.CharTokenizerReturn a tokenized copy of s.
| Returns | |
| list of str | Undocumented |