class HunposTagger(TaggerI): (source)
Constructor: HunposTagger(path_to_model, path_to_bin, encoding, verbose)
- A class for pos tagging with HunPos. The input is the paths to:
- a model trained on training data
- (optionally) the path to the hunpos-tag binary
- (optionally) the encoding of the training data (default: ISO-8859-1)
Example:
>>> from nltk.tag import HunposTagger >>> ht = HunposTagger('en_wsj.model') >>> ht.tag('What is the airspeed of an unladen swallow ?'.split()) [('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'), ('unladen', 'NN'), ('swallow', 'VB'), ('?', '.')] >>> ht.close()
This class communicates with the hunpos-tag binary via pipes. When the tagger object is no longer needed, the close() method should be called to free system resources. The class supports the context manager interface; if used in a with statement, the close() method is invoked automatically:
>>> with HunposTagger('en_wsj.model') as ht: ... ht.tag('What is the airspeed of an unladen swallow ?'.split()) ... [('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'), ('unladen', 'NN'), ('swallow', 'VB'), ('?', '.')]
Method | __del__ |
Undocumented |
Method | __enter__ |
Undocumented |
Method | __exit__ |
Undocumented |
Method | __init__ |
Starts the hunpos-tag executable and establishes a connection with it. |
Method | close |
Closes the pipe to the hunpos executable. |
Method | tag |
Tags a single sentence: a list of words. The tokens should not contain any newline characters. |
Instance Variable | _closed |
Undocumented |
Instance Variable | _encoding |
Undocumented |
Instance Variable | _hunpos |
Undocumented |
Instance Variable | _hunpos |
Undocumented |
Instance Variable | _hunpos |
Undocumented |
Inherited from TaggerI
:
Method | evaluate |
Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score. |
Method | tag |
Apply self.tag() to each element of sentences. I.e.: |
Method | _check |
Undocumented |
Starts the hunpos-tag executable and establishes a connection with it.
Parameters | |
path | The model file. |
path | The hunpos-tag binary. |
encoding | The encoding used by the model. Unicode tokens passed to the tag() and tag_sents() methods are converted to this charset when they are sent to hunpos-tag. The default is ISO-8859-1 (Latin-1). This parameter is ignored for str tokens, which are sent as-is. The caller must ensure that tokens are encoded in the right charset. |
verbose | Undocumented |
nltk.tag.api.TaggerI.tag
Tags a single sentence: a list of words. The tokens should not contain any newline characters.