class SentimentAnalyzer(object): (source)
Constructor: SentimentAnalyzer(classifier)
A Sentiment Analysis tool based on machine learning approaches.
Method | __init__ |
Undocumented |
Method | add |
Add a new function to extract features from a document. This function will be used in extract_features(). Important: in this step our kwargs are only representing additional parameters, and NOT the document we have to parse... |
Method | all |
Return all words/tokens from the documents (with duplicates). :param documents: a list of (words, label) tuples. :param labeled: if True , assume that each document is represented by a |
Method | apply |
Apply all feature extractor functions to the documents. This is a wrapper around nltk.classify.util.apply_features . |
Method | bigram |
Return top_n bigram features (using assoc_measure ). Note that this method is based on bigram collocations measures, and not on simple bigram frequency. |
Method | classify |
Classify a single instance applying the features that have already been stored in the SentimentAnalyzer. |
Method | evaluate |
Evaluate and print classifier performance on the test set. |
Method | extract |
Apply extractor functions (and their parameters) to the present document. We pass document as the first parameter of the extractor functions. If we want to use the same extractor function multiple times, we have to add it to the extractors with ... |
Method | save |
Store content in filename . Can be used to store a SentimentAnalyzer. |
Method | train |
Train classifier on the training set, optionally saving the output in the file specified by save_classifier . Additional arguments depend on the specific trainer used. For example, a MaxentClassifier can use ... |
Method | unigram |
Return most common top_n word features. |
Instance Variable | classifier |
Undocumented |
Instance Variable | feat |
Undocumented |
Add a new function to extract features from a document. This function will be used in extract_features(). Important: in this step our kwargs are only representing additional parameters, and NOT the document we have to parse. The document will always be the first parameter in the parameter list, and it will be added in the extract_features() function.
Parameters | |
function | the extractor function to add to the list of feature extractors. |
**kwargs | additional parameters required by the function function. |
Return all words/tokens from the documents (with duplicates).
:param documents: a list of (words, label) tuples.
:param labeled: if True
, assume that each document is represented by a
(words, label) tuple: (list(str), str). If False
, each document is
considered as being a simple list of strings: list(str).
Returns | |
list(str) | A list of all words/tokens in documents . |
Apply all feature extractor functions to the documents. This is a wrapper
around nltk.classify.util.apply_features
.
- If
labeled=False
, return featuresets as: - [feature_func(doc) for doc in documents]
- If
labeled=True
, return featuresets as: - [(feature_func(tok), label) for (tok, label) in toks]
Parameters | |
documents | a list of documents. If labeled=True , the method expects
a list of (words, label) tuples. |
labeled | Undocumented |
Returns | |
LazyMap | Undocumented |
Return top_n
bigram features (using assoc_measure
).
Note that this method is based on bigram collocations measures, and not
on simple bigram frequency.
Parameters | |
documents | a list (or iterable) of tokens. |
top | number of best words/tokens to use, sorted by association measure. |
min | the minimum number of occurrencies of bigrams to take into consideration. |
assoc | bigram association measure to use as score function. |
Returns | |
top_n ngrams scored by the given association measure. |
Classify a single instance applying the features that have already been stored in the SentimentAnalyzer.
Parameters | |
instance | a list (or iterable) of tokens. |
Returns | |
the classification result given by applying the classifier. |
Evaluate and print classifier performance on the test set.
Parameters | |
test | A list of (tokens, label) tuples to use as gold set. |
classifier | a classifier instance (previously trained). |
accuracy | if True , evaluate classifier accuracy. |
f | if True , evaluate classifier f_measure. |
precision | if True , evaluate classifier precision. |
recall | if True , evaluate classifier recall. |
verbose | Undocumented |
Returns | |
dict(str): float | evaluation results. |
Apply extractor functions (and their parameters) to the present document.
We pass document
as the first parameter of the extractor functions.
If we want to use the same extractor function multiple times, we have to
add it to the extractors with add_feat_extractor
using multiple sets of
parameters (one for each call of the extractor function).
Parameters | |
document | the document that will be passed as argument to the feature extractor functions. |
Returns | |
dict | A dictionary of populated features extracted from the document. |
Train classifier on the training set, optionally saving the output in the
file specified by save_classifier
.
Additional arguments depend on the specific trainer used. For example,
a MaxentClassifier can use max_iter
parameter to specify the number
of iterations, while a NaiveBayesClassifier cannot.
Parameters | |
trainer | train method of a classifier.
E.g.: NaiveBayesClassifier.train |
training | the training set to be passed as argument to the
classifier train method. |
save | the filename of the file where the classifier will be stored (optional). |
**kwargs | additional parameters that will be passed as arguments to
the classifier train function. |
Returns | |
A classifier instance trained on the training set. |