module documentation

Implementation of 'TnT - A Statisical Part of Speech Tagger' by Thorsten Brants

http://acl.ldc.upenn.edu/A/A00/A00-1031.pdf

Class TnT TnT - Statistical POS tagger
Function basic_sent_chop Basic method for tokenizing input into sentences for this tagger:
Function demo Undocumented
Function demo2 Undocumented
Function demo3 Undocumented
def basic_sent_chop(data, raw=True): (source)

Basic method for tokenizing input into sentences for this tagger:

Function takes a list of tokens and separates the tokens into lists where each list represents a sentence fragment This function can separate both tagged and raw sequences into basic sentences.

Sentence markers are the set of [,.!?]

This is a simple method which enhances the performance of the TnT tagger. Better sentence tokenization will further enhance the results.

Parameters
data:str or tuple(str, str)list of tokens (words or (word, tag) tuples)
raw:boolboolean flag marking the input data as a list of words or a list of tagged words
Returns
list of sentences sentences are a list of tokens tokens are the same as the input
def demo(): (source)

Undocumented

def demo2(): (source)

Undocumented

def demo3(): (source)

Undocumented