Code for extracting relational triples from the ieer and conll2002 corpora.
Relations are stored internally as dictionaries ('reldicts').
The two serialization outputs are "rtuple" and "clause".
- An rtuple is a tuple of the form (subj, filler, obj), where subj and obj are pairs of Named Entity mentions, and filler is the string of words occurring between sub and obj (with no intervening NEs). Strings are printed via repr() to circumvent locale variations in rendering utf-8 encoded strings.
- A clause is an atom of the form relsym(subjsym, objsym), where the relation, subject and object have been canonicalized to single strings.
Function | class |
Abbreviate an NE class name. :type type: str :rtype: str |
Function | clause |
Print the relation in clausal form. :param reldict: a relation dictionary :type reldict: defaultdict :param relsym: a label for the relation :type relsym: str |
Function | conllesp |
Undocumented |
Function | conllned |
Find the copula+'van' relation ('of') in the Dutch tagged training corpus from CoNLL 2002. |
Function | descape |
Translate one entity to its ISO Latin value. Inspired by example from effbot.org |
Function | extract |
Filter the output of semi_rel2reldict according to specified NE classes and a filler pattern. |
Function | ieer |
Undocumented |
Function | in |
Select pairs of organizations and locations whose mentions occur with an intervening occurrence of the preposition "in". |
Function | list2sym |
Convert a list of strings into a canonical symbol. :type lst: list :return: a Unicode string without whitespace :rtype: unicode |
Function | ne |
Undocumented |
Function | roles |
Undocumented |
Function | rtuple |
Pretty print the reldict as an rtuple. :param reldict: a relation dictionary :type reldict: defaultdict |
Function | semi |
Converts the pairs generated by tree2semi_rel into a 'reldict': a dictionary which stores information about the subject and object NEs plus the filler between them. Additionally, a left and right context of length =< window are captured (within a given input sentence). |
Function | tree2semi |
Group a chunk structure into a list of 'semi-relations' of the form (list(str), Tree). |
Constant | NE |
Undocumented |
Variable | long2short |
Undocumented |
Variable | short2long |
Undocumented |
Function | _expand |
Expand an NE class name. :type type: str :rtype: str |
Function | _join |
Join a list into a string, turning tags tuples into tag strings or just words. :param untag: if True, omit the tag from tagged input strings. :type lst: list :rtype: str |
Print the relation in clausal form. :param reldict: a relation dictionary :type reldict: defaultdict :param relsym: a label for the relation :type relsym: str
Filter the output of semi_rel2reldict according to specified NE classes and a filler pattern.
The parameters subjclass and objclass can be used to restrict the Named Entities to particular types (any of 'LOCATION', 'ORGANIZATION', 'PERSON', 'DURATION', 'DATE', 'CARDINAL', 'PERCENT', 'MONEY', 'MEASURE').
Parameters | |
subjclass:str | the class of the subject Named Entity. |
objclass:str | the class of the object Named Entity. |
doc:ieer document or a list of chunk trees | input document |
corpus:str | name of the corpus to take as input; possible values are 'ieer' and 'conll2002' |
pattern:SRE_Pattern | a regular expression for filtering the fillers of retrieved triples. |
window:int | filters out fillers which exceed this threshold |
Returns | |
list(defaultdict) | see mk_reldicts |
Select pairs of organizations and locations whose mentions occur with an intervening occurrence of the preposition "in".
If the sql parameter is set to True, then the entity pairs are loaded into an in-memory database, and subsequently pulled out using an SQL "SELECT" query.
Convert a list of strings into a canonical symbol. :type lst: list :return: a Unicode string without whitespace :rtype: unicode
Pretty print the reldict as an rtuple. :param reldict: a relation dictionary :type reldict: defaultdict
Converts the pairs generated by tree2semi_rel into a 'reldict': a dictionary which stores information about the subject and object NEs plus the filler between them. Additionally, a left and right context of length =< window are captured (within a given input sentence).
Parameters | |
pairs | a pair of list(str) and Tree, as generated by |
window:int | a threshold for the number of items to include in the left and right context |
trace | Undocumented |
Returns | |
list(defaultdict) | 'relation' dictionaries whose keys are 'lcon', 'subjclass', 'subjtext', 'subjsym', 'filler', objclass', objtext', 'objsym' and 'rcon' |
Group a chunk structure into a list of 'semi-relations' of the form (list(str), Tree).
In order to facilitate the construction of (Tree, string, Tree) triples, this identifies pairs whose first member is a list (possibly empty) of terminal strings, and whose second member is a Tree of the form (NE_label, terminals).
Parameters | |
tree | a chunk tree |
Returns | |
list of tuple | a list of pairs (list(str), Tree) |