module documentation
Implementation of 'TnT - A Statisical Part of Speech Tagger' by Thorsten Brants
Class |
|
TnT - Statistical POS tagger |
Function | basic |
Basic method for tokenizing input into sentences for this tagger: |
Function | demo |
Undocumented |
Function | demo2 |
Undocumented |
Function | demo3 |
Undocumented |
Basic method for tokenizing input into sentences for this tagger:
Function takes a list of tokens and separates the tokens into lists where each list represents a sentence fragment This function can separate both tagged and raw sequences into basic sentences.
Sentence markers are the set of [,.!?]
This is a simple method which enhances the performance of the TnT tagger. Better sentence tokenization will further enhance the results.
Parameters | |
data:str or tuple(str, str) | list of tokens (words or (word, tag) tuples) |
raw:bool | boolean flag marking the input data as a list of words or a list of tagged words |
Returns | |
list of sentences sentences are a list of tokens tokens are the same as the input |