nltk.classify.scikitlearn.SklearnClassifier

class documentation

class SklearnClassifier(ClassifierI): (source)

Wrapper for scikit-learn classifiers.

Method	`__init__`	No summary
Method	`__repr__`	Undocumented
Method	`classify_many`	Classify a batch of samples.
Method	`labels`	The class labels used by this classifier.
Method	`prob_classify_many`	Compute per-class probabilities for a batch of samples.
Method	`train`	Train (fit) the scikit-learn estimator.
Method	`_make_probdist`	Undocumented
Instance Variable	`_clf`	Undocumented
Instance Variable	`_encoder`	Undocumented
Instance Variable	`_vectorizer`	Undocumented

Inherited from ClassifierI:

Method	`classify`	No summary
Method	`prob_classify`	No summary

def __init__(self, estimator, dtype=float, sparse=True): (source) ¶

Parameters
estimator	scikit-learn classifier object.
dtype	data type used when building feature array. scikit-learn estimators work exclusively on numeric data. The default value should be fine for almost all situations.
sparse:boolean.	Whether to use sparse matrices internally. The estimator must support these; not all scikit-learn classifiers do (see their respective documentation and look for "sparse matrix"). The default value is True, since most NLP problems involve sparse feature sets. Setting this to False may take a great amount of memory.

Undocumented

def classify_many(self, featuresets): (source) ¶

overrides nltk.classify.api.ClassifierI.classify_many

Classify a batch of samples.

Parameters
featuresets	An iterable over featuresets, each a dict mapping strings to either numbers, booleans or strings.
Returns
list	The predicted class label for each input sample.

overrides nltk.classify.api.ClassifierI.labels

The class labels used by this classifier.

Returns
list	Undocumented

def prob_classify_many(self, featuresets): (source) ¶

overrides nltk.classify.api.ClassifierI.prob_classify_many

Compute per-class probabilities for a batch of samples.

Parameters
featuresets	An iterable over featuresets, each a dict mapping strings to either numbers, booleans or strings.
Returns
list of `ProbDistI`	Undocumented

def train(self, labeled_featuresets): (source) ¶

Train (fit) the scikit-learn estimator.

Parameters
labeled_featuresets	A list of `(featureset, label)` where each `featureset` is a dict mapping strings to either numbers, booleans or strings.

def _make_probdist(self, y_proba): (source) ¶

Undocumented

Undocumented

Undocumented

Undocumented