class documentation

A tokenizer that divides a string into substrings by splitting on the specified string (defined in subclasses).

Method span_tokenize Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token.
Method tokenize Return a tokenized copy of s.
Property _string Undocumented

Inherited from TokenizerI:

Method span_tokenize_sents Apply self.span_tokenize() to each element of strings. I.e.:
Method tokenize_sents Apply self.tokenize() to each element of strings. I.e.:
def span_tokenize(self, s): (source)

Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token.

Returns
iter(tuple(int, int))Undocumented
def tokenize(self, s): (source)

Return a tokenized copy of s.

Returns
list of strUndocumented
@property
@abstractmethod
_string = (source)