nltk.tokenize.sonority

module documentation

(source)

The Sonority Sequencing Principle (SSP) is a language agnostic algorithm proposed by Otto Jesperson in 1904. The sonorous quality of a phoneme is judged by the openness of the lips. Syllable breaks occur before troughs in sonority. For more on the SSP see Selkirk (1984).

The default implementation uses the English alphabet, but the sonority_hiearchy can be modified to IPA or any other alphabet for the use-case. The SSP is a universal syllabification algorithm, but that does not mean it performs equally across languages. Bartlett et al. (2009) is a good benchmark for English accuracy if utilizing IPA (pg. 311).

Importantly, if a custom hiearchy is supplied and vowels span across more than one level, they should be given separately to the vowels class attribute.

References: - Otto Jespersen. 1904. Lehrbuch der Phonetik.

Leipzig, Teubner. Chapter 13, Silbe, pp. 185-203.

Elisabeth Selkirk. 1984. On the major class features and syllable theory. In Aronoff & Oehrle (eds.) Language Sound Structure: Studies in Phonology. Cambridge, MIT Press. pp. 107-136.
Susan Bartlett, et al. 2009. On the Syllabification of Phonemes. In HLT-NAACL. pp. 308-316.

Class SyllableTokenizer Syllabifies words based on the Sonority Sequencing Principle (SSP).

nltk.tokenize.sonority_sequencing

`nltk.tokenize.sonority_sequencing`