class documentation

class SentimentAnalyzer(object): (source)

Constructor: SentimentAnalyzer(classifier)

View In Hierarchy

A Sentiment Analysis tool based on machine learning approaches.

Method __init__ Undocumented
Method add_feat_extractor Add a new function to extract features from a document. This function will be used in extract_features(). Important: in this step our kwargs are only representing additional parameters, and NOT the document we have to parse...
Method all_words Return all words/tokens from the documents (with duplicates). :param documents: a list of (words, label) tuples. :param labeled: if True, assume that each document is represented by a
Method apply_features Apply all feature extractor functions to the documents. This is a wrapper around nltk.classify.util.apply_features.
Method bigram_collocation_feats Return top_n bigram features (using assoc_measure). Note that this method is based on bigram collocations measures, and not on simple bigram frequency.
Method classify Classify a single instance applying the features that have already been stored in the SentimentAnalyzer.
Method evaluate Evaluate and print classifier performance on the test set.
Method extract_features Apply extractor functions (and their parameters) to the present document. We pass document as the first parameter of the extractor functions. If we want to use the same extractor function multiple times, we have to add it to the extractors with ...
Method save_file Store content in filename. Can be used to store a SentimentAnalyzer.
Method train Train classifier on the training set, optionally saving the output in the file specified by save_classifier. Additional arguments depend on the specific trainer used. For example, a MaxentClassifier can use ...
Method unigram_word_feats Return most common top_n word features.
Instance Variable classifier Undocumented
Instance Variable feat_extractors Undocumented
def __init__(self, classifier=None): (source)

Undocumented

def add_feat_extractor(self, function, **kwargs): (source)

Add a new function to extract features from a document. This function will be used in extract_features(). Important: in this step our kwargs are only representing additional parameters, and NOT the document we have to parse. The document will always be the first parameter in the parameter list, and it will be added in the extract_features() function.

Parameters
functionthe extractor function to add to the list of feature extractors.
**kwargsadditional parameters required by the function function.
def all_words(self, documents, labeled=None): (source)

Return all words/tokens from the documents (with duplicates). :param documents: a list of (words, label) tuples. :param labeled: if True, assume that each document is represented by a

(words, label) tuple: (list(str), str). If False, each document is considered as being a simple list of strings: list(str).
Returns
list(str)A list of all words/tokens in documents.
def apply_features(self, documents, labeled=None): (source)

Apply all feature extractor functions to the documents. This is a wrapper around nltk.classify.util.apply_features.

If labeled=False, return featuresets as:
[feature_func(doc) for doc in documents]
If labeled=True, return featuresets as:
[(feature_func(tok), label) for (tok, label) in toks]
Parameters
documentsa list of documents. If labeled=True, the method expects a list of (words, label) tuples.
labeledUndocumented
Returns
LazyMapUndocumented
def bigram_collocation_feats(self, documents, top_n=None, min_freq=3, assoc_measure=BigramAssocMeasures.pmi): (source)

Return top_n bigram features (using assoc_measure). Note that this method is based on bigram collocations measures, and not on simple bigram frequency.

Parameters
documentsa list (or iterable) of tokens.
top_nnumber of best words/tokens to use, sorted by association measure.
min_freqthe minimum number of occurrencies of bigrams to take into consideration.
assoc_measurebigram association measure to use as score function.
Returns
top_n ngrams scored by the given association measure.
def classify(self, instance): (source)

Classify a single instance applying the features that have already been stored in the SentimentAnalyzer.

Parameters
instancea list (or iterable) of tokens.
Returns
the classification result given by applying the classifier.
def evaluate(self, test_set, classifier=None, accuracy=True, f_measure=True, precision=True, recall=True, verbose=False): (source)

Evaluate and print classifier performance on the test set.

Parameters
test_setA list of (tokens, label) tuples to use as gold set.
classifiera classifier instance (previously trained).
accuracyif True, evaluate classifier accuracy.
f_measureif True, evaluate classifier f_measure.
precisionif True, evaluate classifier precision.
recallif True, evaluate classifier recall.
verboseUndocumented
Returns
dict(str): floatevaluation results.
def extract_features(self, document): (source)

Apply extractor functions (and their parameters) to the present document. We pass document as the first parameter of the extractor functions. If we want to use the same extractor function multiple times, we have to add it to the extractors with add_feat_extractor using multiple sets of parameters (one for each call of the extractor function).

Parameters
documentthe document that will be passed as argument to the feature extractor functions.
Returns
dictA dictionary of populated features extracted from the document.
def save_file(self, content, filename): (source)

Store content in filename. Can be used to store a SentimentAnalyzer.

def train(self, trainer, training_set, save_classifier=None, **kwargs): (source)

Train classifier on the training set, optionally saving the output in the file specified by save_classifier. Additional arguments depend on the specific trainer used. For example, a MaxentClassifier can use max_iter parameter to specify the number of iterations, while a NaiveBayesClassifier cannot.

Parameters
trainertrain method of a classifier. E.g.: NaiveBayesClassifier.train
training_setthe training set to be passed as argument to the classifier train method.
save_classifierthe filename of the file where the classifier will be stored (optional).
**kwargsadditional parameters that will be passed as arguments to the classifier train function.
Returns
A classifier instance trained on the training set.
def unigram_word_feats(self, words, top_n=None, min_freq=0): (source)

Return most common top_n word features.

Parameters
wordsa list of words/tokens.
top_nnumber of best words/tokens to use, sorted by frequency.
min_freqUndocumented
Returns
list(str)A list of top_n words/tokens (with no duplicates) sorted by frequency.
classifier = (source)

Undocumented

feat_extractors = (source)

Undocumented