class documentation

Wrapper for scikit-learn classifiers.

Method __init__ No summary
Method __repr__ Undocumented
Method classify_many Classify a batch of samples.
Method labels The class labels used by this classifier.
Method prob_classify_many Compute per-class probabilities for a batch of samples.
Method train Train (fit) the scikit-learn estimator.
Method _make_probdist Undocumented
Instance Variable _clf Undocumented
Instance Variable _encoder Undocumented
Instance Variable _vectorizer Undocumented

Inherited from ClassifierI:

Method classify No summary
Method prob_classify No summary
def __init__(self, estimator, dtype=float, sparse=True): (source)
Parameters
estimatorscikit-learn classifier object.
dtypedata type used when building feature array. scikit-learn estimators work exclusively on numeric data. The default value should be fine for almost all situations.
sparse:boolean.Whether to use sparse matrices internally. The estimator must support these; not all scikit-learn classifiers do (see their respective documentation and look for "sparse matrix"). The default value is True, since most NLP problems involve sparse feature sets. Setting this to False may take a great amount of memory.
def __repr__(self): (source)

Undocumented

def classify_many(self, featuresets): (source)

Classify a batch of samples.

Parameters
featuresetsAn iterable over featuresets, each a dict mapping strings to either numbers, booleans or strings.
Returns
listThe predicted class label for each input sample.
def labels(self): (source)

The class labels used by this classifier.

Returns
listUndocumented
def prob_classify_many(self, featuresets): (source)

Compute per-class probabilities for a batch of samples.

Parameters
featuresetsAn iterable over featuresets, each a dict mapping strings to either numbers, booleans or strings.
Returns
list of ProbDistIUndocumented
def train(self, labeled_featuresets): (source)

Train (fit) the scikit-learn estimator.

Parameters
labeled_featuresetsA list of (featureset, label) where each featureset is a dict mapping strings to either numbers, booleans or strings.
def _make_probdist(self, y_proba): (source)

Undocumented

Undocumented

_encoder = (source)

Undocumented

_vectorizer = (source)

Undocumented