class CRFTagger(TaggerI): (source)
Constructor: CRFTagger(feature_func, verbose, training_opt)
A module for POS tagging using CRFSuite https://pypi.python.org/pypi/python-crfsuite
>>> from nltk.tag import CRFTagger >>> ct = CRFTagger()
>>> train_data = [[('University','Noun'), ('is','Verb'), ('a','Det'), ('good','Adj'), ('place','Noun')], ... [('dog','Noun'),('eat','Verb'),('meat','Noun')]]
>>> ct.train(train_data,'model.crf.tagger') >>> ct.tag_sents([['dog','is','good'], ['Cat','eat','meat']]) [[('dog', 'Noun'), ('is', 'Verb'), ('good', 'Adj')], [('Cat', 'Noun'), ('eat', 'Verb'), ('meat', 'Noun')]]
>>> gold_sentences = [[('dog','Noun'),('is','Verb'),('good','Adj')] , [('Cat','Noun'),('eat','Verb'), ('meat','Noun')]] >>> ct.evaluate(gold_sentences) 1.0
Setting learned model file >>> ct = CRFTagger() >>> ct.set_model_file('model.crf.tagger') >>> ct.evaluate(gold_sentences) 1.0
Method | __init__ |
Initialize the CRFSuite tagger :param feature_func: The function that extracts features for each token of a sentence. This function should take 2 parameters: tokens and index which extract features at index position from tokens list... |
Method | set |
Undocumented |
Method | tag |
Train a new model using ``train'' function |
Method | tag |
Train a new model using ``train'' function |
Method | train |
Train the CRF tagger using CRFSuite :params train_data : is the list of annotated sentences. :type train_data : list (list(tuple(str,str))) :params model_file : the model will be saved to this file. |
Method | _get |
Current Word |
Instance Variable | _feature |
Undocumented |
Instance Variable | _model |
Undocumented |
Instance Variable | _pattern |
Undocumented |
Instance Variable | _tagger |
Undocumented |
Instance Variable | _training |
Undocumented |
Instance Variable | _verbose |
Undocumented |
Inherited from TaggerI
:
Method | evaluate |
Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score. |
Method | _check |
Undocumented |
Initialize the CRFSuite tagger :param feature_func: The function that extracts features for each token of a sentence. This function should take 2 parameters: tokens and index which extract features at index position from tokens list. See the build in _get_features function for more detail. :param verbose: output the debugging messages during training. :type verbose: boolean :param training_opt: python-crfsuite training options :type training_opt : dictionary
- Set of possible training options (using LBFGS training algorithm).
'feature.minfreq' : The minimum frequency of features. 'feature.possible_states' : Force to generate possible state features. 'feature.possible_transitions' : Force to generate possible transition features. 'c1' : Coefficient for L1 regularization. 'c2' : Coefficient for L2 regularization. 'max_iterations' : The maximum number of iterations for L-BFGS optimization. 'num_memories' : The number of limited memories for approximating the inverse hessian matrix. 'epsilon' : Epsilon for testing the convergence of the objective. 'period' : The duration of iterations to test the stopping criterion. 'delta' : The threshold for the stopping criterion; an L-BFGS iteration stops when the
improvement of the log likelihood over the last ${period} iterations is no greater than this threshold.
- 'linesearch' : The line search algorithm used in L-BFGS updates:
- { 'MoreThuente': More and Thuente's method,
- 'Backtracking': Backtracking method with regular Wolfe condition, 'StrongBacktracking': Backtracking method with strong Wolfe condition
}
'max_linesearch' : The maximum number of trials for the line search algorithm.
nltk.tag.api.TaggerI.tag
- Tag a sentence using Python CRFSuite Tagger. NB before using this function, user should specify the mode_file either by
- Train a new model using ``train'' function
- Use the pre-trained model which is set via ``set_model_file'' function
:params tokens : list of tokens needed to tag. :type tokens : list(str) :return : list of tagged tokens. :rtype : list (tuple(str,str))
nltk.tag.api.TaggerI.tag_sents
- Tag a list of sentences. NB before using this function, user should specify the mode_file either by
- Train a new model using ``train'' function
- Use the pre-trained model which is set via ``set_model_file'' function
:params sentences : list of sentences needed to tag. :type sentences : list(list(str)) :return : list of tagged sentences. :rtype : list (list (tuple(str,str)))
Train the CRF tagger using CRFSuite :params train_data : is the list of annotated sentences. :type train_data : list (list(tuple(str,str))) :params model_file : the model will be saved to this file.