class documentation
class SyllableTokenizer(TokenizerI): (source)
Constructor: SyllableTokenizer(lang, sonority_hierarchy)
Syllabifies words based on the Sonority Sequencing Principle (SSP).
>>> from nltk.tokenize import SyllableTokenizer >>> from nltk import word_tokenize >>> SSP = SyllableTokenizer() >>> SSP.tokenize('justification') ['jus', 'ti', 'fi', 'ca', 'tion'] >>> text = "This is a foobar-like sentence." >>> [SSP.tokenize(token) for token in word_tokenize(text)] [['This'], ['is'], ['a'], ['foo', 'bar', '-', 'li', 'ke'], ['sen', 'ten', 'ce'], ['.']]
Method | __init__ |
No summary |
Method | assign |
Assigns each phoneme its value from the sonority hierarchy. Note: Sentence/text has to be tokenized first. |
Method | tokenize |
Apply the SSP to return a list of syllables. Note: Sentence/text has to be tokenized first. |
Method | validate |
Ensures each syllable has at least one vowel. If the following syllable doesn't have vowel, add it to the current one. |
Instance Variable | phoneme |
Undocumented |
Instance Variable | vowels |
Undocumented |
Inherited from TokenizerI
:
Method | span |
Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token. |
Method | span |
Apply self.span_tokenize() to each element of strings. I.e.: |
Method | tokenize |
Apply self.tokenize() to each element of strings. I.e.: |
Parameters | |
lang:str | Language parameter, default is English, 'en' |
sonority | Sonority hierarchy according to the Sonority Sequencing Principle. |
Assigns each phoneme its value from the sonority hierarchy. Note: Sentence/text has to be tokenized first.
Parameters | |
token:str | Single word or token |
Returns | |
list(tuple(str, int)) | List of tuples, first element is character/phoneme and second is the soronity value. |
overrides
nltk.tokenize.api.TokenizerI.tokenize
Apply the SSP to return a list of syllables. Note: Sentence/text has to be tokenized first.
Parameters | |
token:str | Single word or token |
Returns | |
list(str) | Single word or token broken up into syllables. |