nltk.tag.tnt

module documentation

(source)

Implementation of 'TnT - A Statisical Part of Speech Tagger' by Thorsten Brants

http://acl.ldc.upenn.edu/A/A00/A00-1031.pdf

Class	`TnT`	TnT - Statistical POS tagger
Function	`basic_sent_chop`	Basic method for tokenizing input into sentences for this tagger:
Function	`demo`	Undocumented
Function	`demo2`	Undocumented
Function	`demo3`	Undocumented

def basic_sent_chop(data, raw=True): (source) ¶

Basic method for tokenizing input into sentences for this tagger:

Function takes a list of tokens and separates the tokens into lists where each list represents a sentence fragment This function can separate both tagged and raw sequences into basic sentences.

Sentence markers are the set of [,.!?]

This is a simple method which enhances the performance of the TnT tagger. Better sentence tokenization will further enhance the results.

Parameters
data:str or tuple(str, str)	list of tokens (words or (word, tag) tuples)
raw:bool	boolean flag marking the input data as a list of words or a list of tagged words
Returns
list of sentences sentences are a list of tokens tokens are the same as the input

def demo(): (source) ¶

Undocumented

def demo2(): (source) ¶

Undocumented

def demo3(): (source) ¶

Undocumented