module documentation

Interface for converting POS tags from various treebanks to the universal tagset of Petrov, Das, & McDonald.

The tagset consists of the following 12 coarse tags:

VERB - verbs (all tenses and modes) NOUN - nouns (common and proper) PRON - pronouns ADJ - adjectives ADV - adverbs ADP - adpositions (prepositions and postpositions) CONJ - conjunctions DET - determiners NUM - cardinal numbers PRT - particles or other function words X - other: foreign words, typos, abbreviations . - punctuation

@see: http://arxiv.org/abs/1104.2086 and http://code.google.com/p/universal-pos-tags/

Function map_tag Maps the tag from the source tagset to the target tagset.
Function tagset_mapping Retrieve the mapping dictionary between tagsets.
Function _load_universal_map Undocumented
Constant _MAPPINGS Undocumented
Constant _UNIVERSAL_DATA Undocumented
Constant _UNIVERSAL_TAGS Undocumented
def map_tag(source, target, source_tag): (source)

Maps the tag from the source tagset to the target tagset.

>>> map_tag('en-ptb', 'universal', 'VBZ')
'VERB'
>>> map_tag('en-ptb', 'universal', 'VBP')
'VERB'
>>> map_tag('en-ptb', 'universal', '``')
'.'
def tagset_mapping(source, target): (source)

Retrieve the mapping dictionary between tagsets.

>>> tagset_mapping('ru-rnc', 'universal') == {'!': '.', 'A': 'ADJ', 'C': 'CONJ', 'AD': 'ADV',            'NN': 'NOUN', 'VG': 'VERB', 'COMP': 'CONJ', 'NC': 'NUM', 'VP': 'VERB', 'P': 'ADP',            'IJ': 'X', 'V': 'VERB', 'Z': 'X', 'VI': 'VERB', 'YES_NO_SENT': 'X', 'PTCL': 'PRT'}
True
def _load_universal_map(fileid): (source)

Undocumented

_MAPPINGS = (source)

Undocumented

Value
defaultdict((lambda : defaultdict(lambda : defaultdict(lambda : 'UNK'))))
_UNIVERSAL_DATA: str = (source)

Undocumented

Value
'taggers/universal_tagset'
_UNIVERSAL_TAGS: tuple[str, ...] = (source)

Undocumented

Value
('VERB',
 'NOUN',
 'PRON',
 'ADJ',
 'ADV',
 'ADP',
 'CONJ',
...