class documentation

A processing interface for tokenizing a string. Subclasses must define tokenize() or tokenize_sents() (or both).

Method span_tokenize Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token.
Method span_tokenize_sents Apply self.span_tokenize() to each element of strings. I.e.:
Method tokenize Return a tokenized copy of s.
Method tokenize_sents Apply self.tokenize() to each element of strings. I.e.:
def span_tokenize(self, s): (source)

Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token.

Returns
iter(tuple(int, int))Undocumented
def span_tokenize_sents(self, strings): (source)

Apply self.span_tokenize() to each element of strings. I.e.:

return [self.span_tokenize(s) for s in strings]
Returns
iter(list(tuple(int, int)))Undocumented
def tokenize_sents(self, strings): (source)

Apply self.tokenize() to each element of strings. I.e.:

return [self.tokenize(s) for s in strings]
Returns
list(list(str))Undocumented