class documentation

A Naive Bayes classifier. Naive Bayes classifiers are paramaterized by two probability distributions:

  • P(label) gives the probability that an input will receive each label, given no information about the input's features.
  • P(fname=fval|label) gives the probability that a given feature (fname) will receive a given value (fval), given that the label (label).

If the classifier encounters an input with a feature that has never been seen with any label, then rather than assigning a probability of 0 to all labels, it will ignore that feature.

The feature value 'None' is reserved for unseen feature values; you generally should not use 'None' as a feature value for one of your own features.

Class Method train No summary
Method __init__ No summary
Method classify No summary
Method labels No summary
Method most_informative_features Return a list of the 'most informative' features used by this classifier. For the purpose of this function, the informativeness of a feature (fname,fval) is equal to the highest value of P(fname=fval|label), for any label, divided by the lowest value of P(fname=fval|label), for any label:...
Method prob_classify No summary
Method show_most_informative_features Undocumented
Instance Variable _feature_probdist Undocumented
Instance Variable _label_probdist Undocumented
Instance Variable _labels Undocumented
Instance Variable _most_informative_features Undocumented

Inherited from ClassifierI:

Method classify_many Apply self.classify() to each element of featuresets. I.e.:
Method prob_classify_many Apply self.prob_classify() to each element of featuresets. I.e.:
@classmethod
def train(cls, labeled_featuresets, estimator=ELEProbDist): (source)
Parameters
labeled_featuresetsA list of classified featuresets, i.e., a list of tuples (featureset, label).
estimatorUndocumented
def __init__(self, label_probdist, feature_probdist): (source)
Parameters
label_probdistP(label), the probability distribution over labels. It is expressed as a ProbDistI whose samples are labels. I.e., P(label) = label_probdist.prob(label).
feature_probdistP(fname=fval|label), the probability distribution for feature values, given labels. It is expressed as a dictionary whose keys are (label, fname) pairs and whose values are ProbDistI objects over feature values. I.e., P(fname=fval|label) = feature_probdist[label,fname].prob(fval). If a given (label,fname) is not a key in feature_probdist, then it is assumed that the corresponding P(fname=fval|label) is 0 for all values of fval.
def classify(self, featureset): (source)
Returns
labelthe most appropriate label for the given featureset.
def labels(self): (source)
Returns
list of (immutable)the list of category labels used by this classifier.
def most_informative_features(self, n=100): (source)

Return a list of the 'most informative' features used by this classifier. For the purpose of this function, the informativeness of a feature (fname,fval) is equal to the highest value of P(fname=fval|label), for any label, divided by the lowest value of P(fname=fval|label), for any label:

max[ P(fname=fval|label1) / P(fname=fval|label2) ]
def prob_classify(self, featureset): (source)
Returns
ProbDistIa probability distribution over labels for the given featureset.
def show_most_informative_features(self, n=10): (source)

Undocumented

_feature_probdist = (source)

Undocumented

_label_probdist = (source)

Undocumented

Undocumented

_most_informative_features = (source)

Undocumented