nltk.tokenize.simple.TabTokenizer

class documentation

class TabTokenizer(StringTokenizer): (source)

Tokenize a string use the tab character as a delimiter, the same as s.split('\t').

>>> from nltk.tokenize import TabTokenizer
>>> TabTokenizer().tokenize('a\tb c\n\t d')
['a', 'b c\n', ' d']

Class Variable _string Undocumented

Inherited from StringTokenizer:

Method	`span_tokenize`	Identify the tokens using integer offsets `(start_i, end_i)`, where `s[start_i:end_i]` is the corresponding token.
Method	`tokenize`	Return a tokenized copy of s.

Inherited from TokenizerI (via StringTokenizer):

Method	`span_tokenize_sents`	Apply `self.span_tokenize()` to each element of `strings`. I.e.:
Method	`tokenize_sents`	Apply `self.tokenize()` to each element of `strings`. I.e.:

overrides nltk.tokenize.api.StringTokenizer._string

Undocumented