class documentation

A class for pos tagging with HunPos. The input is the paths to:
  • a model trained on training data
  • (optionally) the path to the hunpos-tag binary
  • (optionally) the encoding of the training data (default: ISO-8859-1)

Example:

>>> from nltk.tag import HunposTagger
>>> ht = HunposTagger('en_wsj.model')
>>> ht.tag('What is the airspeed of an unladen swallow ?'.split())
[('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'), ('unladen', 'NN'), ('swallow', 'VB'), ('?', '.')]
>>> ht.close()

This class communicates with the hunpos-tag binary via pipes. When the tagger object is no longer needed, the close() method should be called to free system resources. The class supports the context manager interface; if used in a with statement, the close() method is invoked automatically:

>>> with HunposTagger('en_wsj.model') as ht:
...     ht.tag('What is the airspeed of an unladen swallow ?'.split())
...
[('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'), ('unladen', 'NN'), ('swallow', 'VB'), ('?', '.')]

Method __del__ Undocumented
Method __enter__ Undocumented
Method __exit__ Undocumented
Method __init__ Starts the hunpos-tag executable and establishes a connection with it.
Method close Closes the pipe to the hunpos executable.
Method tag Tags a single sentence: a list of words. The tokens should not contain any newline characters.
Instance Variable _closed Undocumented
Instance Variable _encoding Undocumented
Instance Variable _hunpos Undocumented
Instance Variable _hunpos_bin Undocumented
Instance Variable _hunpos_model Undocumented

Inherited from TaggerI:

Method evaluate Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score.
Method tag_sents Apply self.tag() to each element of sentences. I.e.:
Method _check_params Undocumented
def __del__(self): (source)

Undocumented

def __enter__(self): (source)

Undocumented

def __exit__(self, exc_type, exc_value, traceback): (source)

Undocumented

def __init__(self, path_to_model, path_to_bin=None, encoding=_hunpos_charset, verbose=False): (source)

Starts the hunpos-tag executable and establishes a connection with it.

Parameters
path_to_modelThe model file.
path_to_binThe hunpos-tag binary.
encoding

The encoding used by the model. Unicode tokens passed to the tag() and tag_sents() methods are converted to this charset when they are sent to hunpos-tag. The default is ISO-8859-1 (Latin-1).

This parameter is ignored for str tokens, which are sent as-is. The caller must ensure that tokens are encoded in the right charset.

verboseUndocumented
def close(self): (source)

Closes the pipe to the hunpos executable.

def tag(self, tokens): (source)

Tags a single sentence: a list of words. The tokens should not contain any newline characters.

_closed: bool = (source)

Undocumented

_encoding = (source)

Undocumented

Undocumented

_hunpos_bin = (source)

Undocumented

_hunpos_model = (source)

Undocumented