nltk.classify.naivebayes.NaiveBayesClassifier

class documentation

class NaiveBayesClassifier(ClassifierI): (source)

Known subclasses: nltk.classify.positivenaivebayes.PositiveNaiveBayesClassifier

Constructor: NaiveBayesClassifier(label_probdist, feature_probdist)

A Naive Bayes classifier. Naive Bayes classifiers are paramaterized by two probability distributions:

P(label) gives the probability that an input will receive each label, given no information about the input's features.

P(fname=fval|label) gives the probability that a given feature (fname) will receive a given value (fval), given that the label (label).

If the classifier encounters an input with a feature that has never been seen with any label, then rather than assigning a probability of 0 to all labels, it will ignore that feature.

The feature value 'None' is reserved for unseen feature values; you generally should not use 'None' as a feature value for one of your own features.

Class Method	`train`	No summary
Method	`__init__`	No summary
Method	`classify`	No summary
Method	`labels`	No summary
Method	`most_informative_features`	Return a list of the 'most informative' features used by this classifier. For the purpose of this function, the informativeness of a feature `(fname,fval)` is equal to the highest value of P(fname=fval\|label), for any label, divided by the lowest value of P(fname=fval\|label), for any label:...
Method	`prob_classify`	No summary
Method	`show_most_informative_features`	Undocumented
Instance Variable	`_feature_probdist`	Undocumented
Instance Variable	`_label_probdist`	Undocumented
Instance Variable	`_labels`	Undocumented
Instance Variable	`_most_informative_features`	Undocumented

Inherited from ClassifierI:

Method	`classify_many`	Apply `self.classify()` to each element of `featuresets`. I.e.:
Method	`prob_classify_many`	Apply `self.prob_classify()` to each element of `featuresets`. I.e.:

@classmethod
def train(cls, labeled_featuresets, estimator=ELEProbDist): (source) ¶

overridden in nltk.classify.positivenaivebayes.PositiveNaiveBayesClassifier

Parameters
labeled_featuresets	A list of classified featuresets, i.e., a list of tuples `(featureset, label)`.
estimator	Undocumented

def __init__(self, label_probdist, feature_probdist): (source) ¶

Parameters
label_probdist	P(label), the probability distribution over labels. It is expressed as a `ProbDistI` whose samples are labels. I.e., P(label) = `label_probdist.prob(label)`.
feature_probdist	P(fname=fval\|label), the probability distribution for feature values, given labels. It is expressed as a dictionary whose keys are `(label, fname)` pairs and whose values are `ProbDistI` objects over feature values. I.e., P(fname=fval\|label) = `feature_probdist[label,fname].prob(fval)`. If a given `(label,fname)` is not a key in `feature_probdist`, then it is assumed that the corresponding P(fname=fval\|label) is 0 for all values of `fval`.

def classify(self, featureset): (source) ¶

overrides nltk.classify.api.ClassifierI.classify

Returns
label	the most appropriate label for the given featureset.

def labels(self): (source) ¶

overrides nltk.classify.api.ClassifierI.labels

Returns
list of (immutable)	the list of category labels used by this classifier.

def most_informative_features(self, n=100): (source) ¶

Return a list of the 'most informative' features used by this classifier. For the purpose of this function, the informativeness of a feature (fname,fval) is equal to the highest value of P(fname=fval|label), for any label, divided by the lowest value of P(fname=fval|label), for any label:

max[ P(fname=fval|label1) / P(fname=fval|label2) ]

def prob_classify(self, featureset): (source) ¶

overrides nltk.classify.api.ClassifierI.prob_classify

Returns
ProbDistI	a probability distribution over labels for the given featureset.

def show_most_informative_features(self, n=10): (source) ¶

Undocumented

_feature_probdist = (source) ¶

Undocumented

_label_probdist = (source) ¶

Undocumented

_labels = (source) ¶

Undocumented

_most_informative_features = (source) ¶

Undocumented