class documentation
class SpaceTokenizer(StringTokenizer): (source)
Tokenize a string using the space character as a delimiter, which is the same as s.split(' ').
>>> from nltk.tokenize import SpaceTokenizer >>> s = "Good muffins cost $3.88\nin New York. Please buy me\ntwo of them.\n\nThanks." >>> SpaceTokenizer().tokenize(s) ['Good', 'muffins', 'cost', '$3.88\nin', 'New', 'York.', '', 'Please', 'buy', 'me\ntwo', 'of', 'them.\n\nThanks.']
| Class Variable | _string |
Undocumented |
Inherited from StringTokenizer:
| Method | span |
Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token. |
| Method | tokenize |
Return a tokenized copy of s. |
Inherited from TokenizerI (via StringTokenizer):
| Method | span |
Apply self.span_tokenize() to each element of strings. I.e.: |
| Method | tokenize |
Apply self.tokenize() to each element of strings. I.e.: |