«
module documentation

Undocumented

Class ChunkScore A utility class for scoring chunk parsers. ChunkScore can evaluate a chunk parser's output, based on a number of statistics (precision, recall, f-measure, misssed chunks, incorrect chunks). It can also combine the scores from the parsing of multiple texts; this makes it significantly easier to evaluate a chunk parser that operates one sentence at a time.
Function accuracy Score the accuracy of the chunker against the gold standard. Strip the chunk information from the gold standard and rechunk it using the chunker, then compute the accuracy score.
Function conllstr2tree Return a chunk structure for a single sentence encoded in the given CONLL 2000 style string. This function converts a CoNLL IOB string into a tree. It uses the specified chunk types (defaults to NP, PP and VP), and creates a tree rooted at a node labeled S (by default).
Function conlltags2tree Convert the CoNLL IOB format to a tree.
Function demo Undocumented
Function ieerstr2tree Return a chunk structure containing the chunked tagged text that is encoded in the given IEER style string. Convert a string of chunked tagged text in the IEER named entity format into a chunk structure...
Function tagstr2tree Divide a string of bracketted tagged text into chunks and unchunked tokens, and produce a Tree. Chunks are marked by square brackets ([...]). Words are delimited by whitespace, and each word should have the form ...
Function tree2conllstr Return a multiline string where each line contains a word, tag and IOB tag. Convert a tree to the CoNLL IOB string format
Function tree2conlltags Return a list of 3-tuples containing (word, tag, IOB-tag). Convert a tree to the CoNLL IOB tag format.
Function _chunksets Undocumented
Function _ieer_read_text Undocumented
Constant _IEER_DOC_RE Undocumented
Constant _IEER_TYPE_RE Undocumented
Constant _LINE_RE Undocumented
def accuracy(chunker, gold): (source)

Score the accuracy of the chunker against the gold standard. Strip the chunk information from the gold standard and rechunk it using the chunker, then compute the accuracy score.

Parameters
chunker:ChunkParserIThe chunker being evaluated.
gold:treeThe chunk structures to score the chunker on.
Returns
floatUndocumented
def conllstr2tree(s, chunk_types=('NP', 'PP', 'VP'), root_label='S'): (source)

Return a chunk structure for a single sentence encoded in the given CONLL 2000 style string. This function converts a CoNLL IOB string into a tree. It uses the specified chunk types (defaults to NP, PP and VP), and creates a tree rooted at a node labeled S (by default).

Parameters
s:strThe CoNLL string to be converted.
chunk_types:tupleThe chunk types to be converted.
root_label:strThe node label to use for the root.
Returns
TreeUndocumented
def conlltags2tree(sentence, chunk_types=('NP', 'PP', 'VP'), root_label='S', strict=False): (source)

Convert the CoNLL IOB format to a tree.

def demo(): (source)

Undocumented

def ieerstr2tree(s, chunk_types=['LOCATION', 'ORGANIZATION', 'PERSON', 'DURATION', 'DATE', 'CARDINAL', 'PERCENT', 'MONEY', 'MEASURE'], root_label='S'): (source)

Return a chunk structure containing the chunked tagged text that is encoded in the given IEER style string. Convert a string of chunked tagged text in the IEER named entity format into a chunk structure. Chunks are of several types, LOCATION, ORGANIZATION, PERSON, DURATION, DATE, CARDINAL, PERCENT, MONEY, and MEASURE.

Returns
TreeUndocumented
def tagstr2tree(s, chunk_label='NP', root_label='S', sep='/', source_tagset=None, target_tagset=None): (source)

Divide a string of bracketted tagged text into chunks and unchunked tokens, and produce a Tree. Chunks are marked by square brackets ([...]). Words are delimited by whitespace, and each word should have the form text/tag. Words that do not contain a slash are assigned a tag of None.

Parameters
s:strThe string to be converted
chunk_label:strThe label to use for chunk nodes
root_label:strThe label to use for the root of the tree
sepUndocumented
source_tagsetUndocumented
target_tagsetUndocumented
Returns
TreeUndocumented
def tree2conllstr(t): (source)

Return a multiline string where each line contains a word, tag and IOB tag. Convert a tree to the CoNLL IOB string format

Parameters
t:TreeThe tree to be converted.
Returns
strUndocumented
def tree2conlltags(t): (source)

Return a list of 3-tuples containing (word, tag, IOB-tag). Convert a tree to the CoNLL IOB tag format.

Parameters
t:TreeThe tree to be converted.
Returns
list(tuple)Undocumented
def _chunksets(t, count, chunk_label): (source)

Undocumented

def _ieer_read_text(s, root_label): (source)

Undocumented

_IEER_DOC_RE = (source)

Undocumented

Value
re.compile(r'<DOC>\s*(<DOCNO>\s*(?P<docno>.+?)\s*</DOCNO>\s*)?(<DOCTYPE>\s*(?P<d
octype>.+?)\s*</DOCTYPE>\s*)?(<DATE_TIME>\s*(?P<date_time>.+?)\s*</DATE_TIME>\s*
)?<BODY>\s*(<HEADLINE>\s*(?P<headline>.+?)\s*</HEADLINE>\s*)?<TEXT>(?P<text>.*?)
</TEXT>\s*</BODY>\s*</DOC>\s*',
           re.DOTALL)
_IEER_TYPE_RE = (source)

Undocumented

Value
re.compile(r'<b_\w+\s+[^>]*?type="(?P<type>\w+)"')
_LINE_RE = (source)

Undocumented

Value
re.compile(r'(\S+)\s+(\S+)\s+([IOB])-?(\S+)?')