class documentation
class TabTokenizer(StringTokenizer): (source)
Tokenize a string use the tab character as a delimiter, the same as s.split('\t').
>>> from nltk.tokenize import TabTokenizer >>> TabTokenizer().tokenize('a\tb c\n\t d') ['a', 'b c\n', ' d']
Class Variable | _string |
Undocumented |
Inherited from StringTokenizer
:
Method | span |
Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token. |
Method | tokenize |
Return a tokenized copy of s. |
Inherited from TokenizerI
(via StringTokenizer
):
Method | span |
Apply self.span_tokenize() to each element of strings. I.e.: |
Method | tokenize |
Apply self.tokenize() to each element of strings. I.e.: |