class documentation

class TestTokenize: (source)

View In Hierarchy

Undocumented

Method test_legality_principle_syllable_tokenizer Test LegalitySyllableTokenizer tokenizer.
Method test_pad_asterisk Test padding of asterisk for word tokenization.
Method test_pad_dotdot Test padding of dotdot* for word tokenization.
Method test_phone_tokenizer Test a string that resembles a phone number but contains a newline
Method test_punkt_pair_iter Undocumented
Method test_punkt_pair_iter_handles_stop_iteration_exception Undocumented
Method test_punkt_tokenize_custom_lang_vars Undocumented
Method test_punkt_tokenize_no_custom_lang_vars Undocumented
Method test_punkt_tokenize_words_handles_stop_iteration_exception Undocumented
Method test_remove_handle Test remove_handle() from casual.py with specially crafted edge cases
Method test_sonority_sequencing_syllable_tokenizer Test SyllableTokenizer tokenizer.
Method test_stanford_segmenter_arabic Test the Stanford Word Segmenter for Arabic (default config)
Method test_stanford_segmenter_chinese Test the Stanford Word Segmenter for Chinese (default config)
Method test_treebank_span_tokenizer Test TreebankWordTokenizer.span_tokenize function
Method test_tweet_tokenizer Test TweetTokenizer using words with special and accented characters.
Method test_word_tokenize Test word_tokenize function
def test_legality_principle_syllable_tokenizer(self): (source)

Test LegalitySyllableTokenizer tokenizer.

def test_pad_asterisk(self): (source)

Test padding of asterisk for word tokenization.

def test_pad_dotdot(self): (source)

Test padding of dotdot* for word tokenization.

def test_phone_tokenizer(self): (source)

Test a string that resembles a phone number but contains a newline

def test_punkt_pair_iter(self): (source)

Undocumented

def test_punkt_pair_iter_handles_stop_iteration_exception(self): (source)

Undocumented

def test_punkt_tokenize_custom_lang_vars(self): (source)

Undocumented

def test_punkt_tokenize_no_custom_lang_vars(self): (source)

Undocumented

def test_punkt_tokenize_words_handles_stop_iteration_exception(self): (source)

Undocumented

def test_remove_handle(self): (source)

Test remove_handle() from casual.py with specially crafted edge cases

def test_sonority_sequencing_syllable_tokenizer(self): (source)

Test SyllableTokenizer tokenizer.

def test_stanford_segmenter_arabic(self): (source)

Test the Stanford Word Segmenter for Arabic (default config)

def test_stanford_segmenter_chinese(self): (source)

Test the Stanford Word Segmenter for Chinese (default config)

def test_treebank_span_tokenizer(self): (source)

Test TreebankWordTokenizer.span_tokenize function

def test_tweet_tokenizer(self): (source)

Test TweetTokenizer using words with special and accented characters.

def test_word_tokenize(self): (source)

Test word_tokenize function