class documentation
class StringTokenizer(TokenizerI): (source)
Known subclasses: nltk.tokenize.simple.CharTokenizer
, nltk.tokenize.simple.SpaceTokenizer
, nltk.tokenize.simple.TabTokenizer
A tokenizer that divides a string into substrings by splitting on the specified string (defined in subclasses).
Method | span |
Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token. |
Method | tokenize |
Return a tokenized copy of s. |
Property | _string |
Undocumented |
Inherited from TokenizerI
:
Method | span |
Apply self.span_tokenize() to each element of strings. I.e.: |
Method | tokenize |
Apply self.tokenize() to each element of strings. I.e.: |
overridden in
nltk.tokenize.simple.CharTokenizer
Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token.
Returns | |
iter(tuple(int, int)) | Undocumented |
overrides
nltk.tokenize.api.TokenizerI.tokenize
overridden in
nltk.tokenize.simple.CharTokenizer
Return a tokenized copy of s.
Returns | |
list of str | Undocumented |