nltk.sentiment.sentiment_analyzer.SentimentAnalyzer

class documentation

class SentimentAnalyzer(object): (source)

Constructor: SentimentAnalyzer(classifier)

A Sentiment Analysis tool based on machine learning approaches.

Method	`__init__`	Undocumented
Method	`add_feat_extractor`	Add a new function to extract features from a document. This function will be used in extract_features(). Important: in this step our kwargs are only representing additional parameters, and NOT the document we have to parse...
Method	`all_words`	Return all words/tokens from the documents (with duplicates). :param documents: a list of (words, label) tuples. :param labeled: if `True`, assume that each document is represented by a
Method	`apply_features`	Apply all feature extractor functions to the documents. This is a wrapper around `nltk.classify.util.apply_features`.
Method	`bigram_collocation_feats`	Return `top_n` bigram features (using `assoc_measure`). Note that this method is based on bigram collocations measures, and not on simple bigram frequency.
Method	`classify`	Classify a single instance applying the features that have already been stored in the SentimentAnalyzer.
Method	`evaluate`	Evaluate and print classifier performance on the test set.
Method	`extract_features`	Apply extractor functions (and their parameters) to the present document. We pass `document` as the first parameter of the extractor functions. If we want to use the same extractor function multiple times, we have to add it to the extractors with ...
Method	`save_file`	Store `content` in `filename`. Can be used to store a SentimentAnalyzer.
Method	`train`	Train classifier on the training set, optionally saving the output in the file specified by `save_classifier`. Additional arguments depend on the specific trainer used. For example, a MaxentClassifier can use ...
Method	`unigram_word_feats`	Return most common top_n word features.
Instance Variable	`classifier`	Undocumented
Instance Variable	`feat_extractors`	Undocumented

def __init__(self, classifier=None): (source) ¶

Undocumented

def add_feat_extractor(self, function, **kwargs): (source) ¶

Add a new function to extract features from a document. This function will be used in extract_features(). Important: in this step our kwargs are only representing additional parameters, and NOT the document we have to parse. The document will always be the first parameter in the parameter list, and it will be added in the extract_features() function.

Parameters
function	the extractor function to add to the list of feature extractors.
**kwargs	additional parameters required by the `function` function.

def all_words(self, documents, labeled=None): (source) ¶

Return all words/tokens from the documents (with duplicates). :param documents: a list of (words, label) tuples. :param labeled: if True, assume that each document is represented by a

(words, label) tuple: (list(str), str). If False, each document is considered as being a simple list of strings: list(str).

Returns
list(str)	A list of all words/tokens in `documents`.

def apply_features(self, documents, labeled=None): (source) ¶

Apply all feature extractor functions to the documents. This is a wrapper around nltk.classify.util.apply_features.

If labeled=False, return featuresets as:: [feature_func(doc) for doc in documents]
If labeled=True, return featuresets as:: [(feature_func(tok), label) for (tok, label) in toks]

Parameters
documents	a list of documents. `If labeled=True`, the method expects a list of (words, label) tuples.
labeled	Undocumented
Returns
LazyMap	Undocumented

def bigram_collocation_feats(self, documents, top_n=None, min_freq=3, assoc_measure=BigramAssocMeasures.pmi): (source) ¶

Return top_n bigram features (using assoc_measure). Note that this method is based on bigram collocations measures, and not on simple bigram frequency.

Parameters
documents	a list (or iterable) of tokens.
top_n	number of best words/tokens to use, sorted by association measure.
min_freq	the minimum number of occurrencies of bigrams to take into consideration.
assoc_measure	bigram association measure to use as score function.
Returns
`top_n` ngrams scored by the given association measure.

def classify(self, instance): (source) ¶

Classify a single instance applying the features that have already been stored in the SentimentAnalyzer.

Parameters
instance	a list (or iterable) of tokens.
Returns
the classification result given by applying the classifier.

def evaluate(self, test_set, classifier=None, accuracy=True, f_measure=True, precision=True, recall=True, verbose=False): (source) ¶

Evaluate and print classifier performance on the test set.

Parameters
test_set	A list of (tokens, label) tuples to use as gold set.
classifier	a classifier instance (previously trained).
accuracy	if `True`, evaluate classifier accuracy.
f_measure	if `True`, evaluate classifier f_measure.
precision	if `True`, evaluate classifier precision.
recall	if `True`, evaluate classifier recall.
verbose	Undocumented
Returns
dict(str): float	evaluation results.

def extract_features(self, document): (source) ¶

Apply extractor functions (and their parameters) to the present document. We pass document as the first parameter of the extractor functions. If we want to use the same extractor function multiple times, we have to add it to the extractors with add_feat_extractor using multiple sets of parameters (one for each call of the extractor function).

Parameters
document	the document that will be passed as argument to the feature extractor functions.
Returns
dict	A dictionary of populated features extracted from the document.

def save_file(self, content, filename): (source) ¶

Store content in filename. Can be used to store a SentimentAnalyzer.

def train(self, trainer, training_set, save_classifier=None, **kwargs): (source) ¶

Train classifier on the training set, optionally saving the output in the file specified by save_classifier. Additional arguments depend on the specific trainer used. For example, a MaxentClassifier can use max_iter parameter to specify the number of iterations, while a NaiveBayesClassifier cannot.

Parameters
trainer	`train` method of a classifier. E.g.: NaiveBayesClassifier.train
training_set	the training set to be passed as argument to the classifier `train` method.
save_classifier	the filename of the file where the classifier will be stored (optional).
**kwargs	additional parameters that will be passed as arguments to the classifier `train` function.
Returns
	A classifier instance trained on the training set.

def unigram_word_feats(self, words, top_n=None, min_freq=0): (source) ¶

Return most common top_n word features.

Parameters
words	a list of words/tokens.
top_n	number of best words/tokens to use, sorted by frequency.
min_freq	Undocumented
Returns
list(str)	A list of `top_n` words/tokens (with no duplicates) sorted by frequency.

classifier = (source) ¶

Undocumented

feat_extractors = (source) ¶

Undocumented