nltk.tokenize.simple.SpaceTokenizer

class documentation

class SpaceTokenizer(StringTokenizer): (source)

Tokenize a string using the space character as a delimiter, which is the same as s.split(' ').

>>> from nltk.tokenize import SpaceTokenizer
>>> s = "Good muffins cost $3.88\nin New York.  Please buy me\ntwo of them.\n\nThanks."
>>> SpaceTokenizer().tokenize(s)
['Good', 'muffins', 'cost', '$3.88\nin', 'New', 'York.', '',
'Please', 'buy', 'me\ntwo', 'of', 'them.\n\nThanks.']

Class Variable _string Undocumented

Inherited from StringTokenizer:

Method	`span_tokenize`	Identify the tokens using integer offsets `(start_i, end_i)`, where `s[start_i:end_i]` is the corresponding token.
Method	`tokenize`	Return a tokenized copy of s.

Inherited from TokenizerI (via StringTokenizer):

Method	`span_tokenize_sents`	Apply `self.span_tokenize()` to each element of `strings`. I.e.:
Method	`tokenize_sents`	Apply `self.tokenize()` to each element of `strings`. I.e.:

_string: str = (source) ¶

overrides nltk.tokenize.api.StringTokenizer._string

Undocumented