class NaiveBayesClassifier(ClassifierI): (source)
Known subclasses: nltk.classify.positivenaivebayes.PositiveNaiveBayesClassifier
Constructor: NaiveBayesClassifier(label_probdist, feature_probdist)
A Naive Bayes classifier. Naive Bayes classifiers are paramaterized by two probability distributions:
- P(label) gives the probability that an input will receive each label, given no information about the input's features.
- P(fname=fval|label) gives the probability that a given feature (fname) will receive a given value (fval), given that the label (label).
If the classifier encounters an input with a feature that has never been seen with any label, then rather than assigning a probability of 0 to all labels, it will ignore that feature.
The feature value 'None' is reserved for unseen feature values; you generally should not use 'None' as a feature value for one of your own features.
Class Method | train |
No summary |
Method | __init__ |
No summary |
Method | classify |
No summary |
Method | labels |
No summary |
Method | most |
Return a list of the 'most informative' features used by this classifier. For the purpose of this function, the informativeness of a feature (fname,fval) is equal to the highest value of P(fname=fval|label), for any label, divided by the lowest value of P(fname=fval|label), for any label:... |
Method | prob |
No summary |
Method | show |
Undocumented |
Instance Variable | _feature |
Undocumented |
Instance Variable | _label |
Undocumented |
Instance Variable | _labels |
Undocumented |
Instance Variable | _most |
Undocumented |
Inherited from ClassifierI
:
Method | classify |
Apply self.classify() to each element of featuresets. I.e.: |
Method | prob |
Apply self.prob_classify() to each element of featuresets. I.e.: |
Parameters | |
labeled | A list of classified featuresets, i.e., a list of tuples (featureset, label). |
estimator | Undocumented |
Parameters | |
label | P(label), the probability distribution over labels. It is expressed as a ProbDistI whose samples are labels. I.e., P(label) = label_probdist.prob(label). |
feature | P(fname=fval|label), the probability distribution for feature values, given labels. It is expressed as a dictionary whose keys are (label, fname) pairs and whose values are ProbDistI objects over feature values. I.e., P(fname=fval|label) = feature_probdist[label,fname].prob(fval). If a given (label,fname) is not a key in feature_probdist, then it is assumed that the corresponding P(fname=fval|label) is 0 for all values of fval. |
nltk.classify.api.ClassifierI.classify
Returns | |
label | the most appropriate label for the given featureset. |
nltk.classify.api.ClassifierI.labels
Returns | |
list of (immutable) | the list of category labels used by this classifier. |
Return a list of the 'most informative' features used by this classifier. For the purpose of this function, the informativeness of a feature (fname,fval) is equal to the highest value of P(fname=fval|label), for any label, divided by the lowest value of P(fname=fval|label), for any label: