nltk.tokenize.api.StringTokenizer

class documentation

class StringTokenizer(TokenizerI): (source)

A tokenizer that divides a string into substrings by splitting on the specified string (defined in subclasses).

Method	`span_tokenize`	Identify the tokens using integer offsets `(start_i, end_i)`, where `s[start_i:end_i]` is the corresponding token.
Method	`tokenize`	Return a tokenized copy of s.
Property	`_string`	Undocumented

Inherited from TokenizerI:

Method	`span_tokenize_sents`	Apply `self.span_tokenize()` to each element of `strings`. I.e.:
Method	`tokenize_sents`	Apply `self.tokenize()` to each element of `strings`. I.e.:

overrides nltk.tokenize.api.TokenizerI.span_tokenize

overridden in nltk.tokenize.simple.CharTokenizer

Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token.

Returns
iter(tuple(int, int))	Undocumented

overrides nltk.tokenize.api.TokenizerI.tokenize

overridden in nltk.tokenize.simple.CharTokenizer

Return a tokenized copy of s.

Returns
list of str	Undocumented

@property
@abstractmethod
_string = (source) ¶

overridden in nltk.tokenize.simple.SpaceTokenizer, nltk.tokenize.simple.TabTokenizer

Undocumented