module documentation

Code for extracting relational triples from the ieer and conll2002 corpora.

Relations are stored internally as dictionaries ('reldicts').

The two serialization outputs are "rtuple" and "clause".

  • An rtuple is a tuple of the form (subj, filler, obj), where subj and obj are pairs of Named Entity mentions, and filler is the string of words occurring between sub and obj (with no intervening NEs). Strings are printed via repr() to circumvent locale variations in rendering utf-8 encoded strings.
  • A clause is an atom of the form relsym(subjsym, objsym), where the relation, subject and object have been canonicalized to single strings.
Function class_abbrev Abbreviate an NE class name. :type type: str :rtype: str
Function clause Print the relation in clausal form. :param reldict: a relation dictionary :type reldict: defaultdict :param relsym: a label for the relation :type relsym: str
Function conllesp Undocumented
Function conllned Find the copula+'van' relation ('of') in the Dutch tagged training corpus from CoNLL 2002.
Function descape_entity Translate one entity to its ISO Latin value. Inspired by example from effbot.org
Function extract_rels Filter the output of semi_rel2reldict according to specified NE classes and a filler pattern.
Function ieer_headlines Undocumented
Function in_demo Select pairs of organizations and locations whose mentions occur with an intervening occurrence of the preposition "in".
Function list2sym Convert a list of strings into a canonical symbol. :type lst: list :return: a Unicode string without whitespace :rtype: unicode
Function ne_chunked Undocumented
Function roles_demo Undocumented
Function rtuple Pretty print the reldict as an rtuple. :param reldict: a relation dictionary :type reldict: defaultdict
Function semi_rel2reldict Converts the pairs generated by tree2semi_rel into a 'reldict': a dictionary which stores information about the subject and object NEs plus the filler between them. Additionally, a left and right context of length =< window are captured (within a given input sentence).
Function tree2semi_rel Group a chunk structure into a list of 'semi-relations' of the form (list(str), Tree).
Constant NE_CLASSES Undocumented
Variable long2short Undocumented
Variable short2long Undocumented
Function _expand Expand an NE class name. :type type: str :rtype: str
Function _join Join a list into a string, turning tags tuples into tag strings or just words. :param untag: if True, omit the tag from tagged input strings. :type lst: list :rtype: str
def class_abbrev(type): (source)

Abbreviate an NE class name. :type type: str :rtype: str

def clause(reldict, relsym): (source)

Print the relation in clausal form. :param reldict: a relation dictionary :type reldict: defaultdict :param relsym: a label for the relation :type relsym: str

def conllesp(): (source)

Undocumented

def conllned(trace=1): (source)

Find the copula+'van' relation ('of') in the Dutch tagged training corpus from CoNLL 2002.

def descape_entity(m, defs=html.entities.entitydefs): (source)

Translate one entity to its ISO Latin value. Inspired by example from effbot.org

def extract_rels(subjclass, objclass, doc, corpus='ace', pattern=None, window=10): (source)

Filter the output of semi_rel2reldict according to specified NE classes and a filler pattern.

The parameters subjclass and objclass can be used to restrict the Named Entities to particular types (any of 'LOCATION', 'ORGANIZATION', 'PERSON', 'DURATION', 'DATE', 'CARDINAL', 'PERCENT', 'MONEY', 'MEASURE').

Parameters
subjclass:strthe class of the subject Named Entity.
objclass:strthe class of the object Named Entity.
doc:ieer document or a list of chunk treesinput document
corpus:strname of the corpus to take as input; possible values are 'ieer' and 'conll2002'
pattern:SRE_Patterna regular expression for filtering the fillers of retrieved triples.
window:intfilters out fillers which exceed this threshold
Returns
list(defaultdict)see mk_reldicts
def ieer_headlines(): (source)

Undocumented

def in_demo(trace=0, sql=True): (source)

Select pairs of organizations and locations whose mentions occur with an intervening occurrence of the preposition "in".

If the sql parameter is set to True, then the entity pairs are loaded into an in-memory database, and subsequently pulled out using an SQL "SELECT" query.

def list2sym(lst): (source)

Convert a list of strings into a canonical symbol. :type lst: list :return: a Unicode string without whitespace :rtype: unicode

def ne_chunked(): (source)

Undocumented

def roles_demo(trace=0): (source)

Undocumented

def rtuple(reldict, lcon=False, rcon=False): (source)

Pretty print the reldict as an rtuple. :param reldict: a relation dictionary :type reldict: defaultdict

def semi_rel2reldict(pairs, window=5, trace=False): (source)

Converts the pairs generated by tree2semi_rel into a 'reldict': a dictionary which stores information about the subject and object NEs plus the filler between them. Additionally, a left and right context of length =< window are captured (within a given input sentence).

Parameters
pairsa pair of list(str) and Tree, as generated by
window:inta threshold for the number of items to include in the left and right context
traceUndocumented
Returns
list(defaultdict)'relation' dictionaries whose keys are 'lcon', 'subjclass', 'subjtext', 'subjsym', 'filler', objclass', objtext', 'objsym' and 'rcon'
def tree2semi_rel(tree): (source)

Group a chunk structure into a list of 'semi-relations' of the form (list(str), Tree).

In order to facilitate the construction of (Tree, string, Tree) triples, this identifies pairs whose first member is a list (possibly empty) of terminal strings, and whose second member is a Tree of the form (NE_label, terminals).

Parameters
treea chunk tree
Returns
list of tuplea list of pairs (list(str), Tree)
NE_CLASSES: dict = (source)

Undocumented

Value
{'ieer': ['LOCATION',
          'ORGANIZATION',
          'PERSON',
          'DURATION',
          'DATE',
          'CARDINAL',
          'PERCENT',
...
long2short = (source)

Undocumented

short2long = (source)

Undocumented

def _expand(type): (source)

Expand an NE class name. :type type: str :rtype: str

def _join(lst, sep=' ', untag=False): (source)

Join a list into a string, turning tags tuples into tag strings or just words. :param untag: if True, omit the tag from tagged input strings. :type lst: list :rtype: str