module documentation

Undocumented

Function padded_everygram_pipeline Default preprocessing for a sequence of sentences.
Function padded_everygrams Helper with some useful defaults.
Variable pad_both_ends Pads both ends of a sentence to length specified by ngram order.
def padded_everygram_pipeline(order, text): (source)

Default preprocessing for a sequence of sentences.

Creates two iterators: - sentences padded and turned into sequences of nltk.util.everygrams - sentences padded as above and chained together for a flat stream of words

Iterable[Iterable[str]] :return: iterator over text as ngrams, iterator over text as vocabulary data

Parameters
orderLargest ngram length produced by everygrams.
textText to iterate over. Expected to be an iterable of sentences:
def padded_everygrams(order, sentence): (source)

Helper with some useful defaults.

Applies pad_both_ends to sentence and follows it up with everygrams.

pad_both_ends = (source)

Pads both ends of a sentence to length specified by ngram order.

Following convention <s> pads the start of sentence </s> pads its end.