class documentation

Syllabifies words based on the Legality Principle and Onset Maximization.

>>> from nltk.tokenize import LegalitySyllableTokenizer
>>> from nltk import word_tokenize
>>> from nltk.corpus import words
>>> text = "This is a wonderful sentence."
>>> text_words = word_tokenize(text)
>>> LP = LegalitySyllableTokenizer(words.words())
>>> [LP.tokenize(word) for word in text_words]
[['This'], ['is'], ['a'], ['won', 'der', 'ful'], ['sen', 'ten', 'ce'], ['.']]
Method __init__ No summary
Method find_legal_onsets Gathers all onsets and then return only those above the frequency threshold
Method onset Returns consonant cluster of word, i.e. all characters until the first vowel.
Method tokenize Apply the Legality Principle in combination with Onset Maximization to return a list of syllables.
Instance Variable legal_frequency_threshold Undocumented
Instance Variable legal_onsets Undocumented
Instance Variable vowels Undocumented

Inherited from TokenizerI:

Method span_tokenize Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token.
Method span_tokenize_sents Apply self.span_tokenize() to each element of strings. I.e.:
Method tokenize_sents Apply self.tokenize() to each element of strings. I.e.:
def __init__(self, tokenized_source_text, vowels='aeiouy', legal_frequency_threshold=0.001): (source)
Parameters
tokenized_source_text:list(str)List of valid tokens in the language
vowels:strValid vowels in language or IPA represenation
legal_frequency_threshold:floatLowest frequency of all onsets to be considered a legal onset
def find_legal_onsets(self, words): (source)

Gathers all onsets and then return only those above the frequency threshold

Parameters
words:list(str)List of words in a language
Returns
set(str)Set of legal onsets
def onset(self, word): (source)

Returns consonant cluster of word, i.e. all characters until the first vowel.

Parameters
word:strSingle word or token
Returns
strString of characters of onset
def tokenize(self, token): (source)

Apply the Legality Principle in combination with Onset Maximization to return a list of syllables.

Parameters
token:strSingle word or token
Returns
list(str)Single word or token broken up into syllables.
legal_frequency_threshold = (source)

Undocumented

legal_onsets = (source)

Undocumented

Undocumented