class documentation

A mixin class used to aid in the implementation of corpus readers for categorized corpora. This class defines the method categories(), which returns a list of the categories for the corpus or for a specified set of fileids; and overrides fileids() to take a categories argument, restricting the set of fileids to be returned.

Subclasses are expected to:

  • Call __init__() to set up the mapping.
  • Override all view methods to accept a categories parameter, which can be used instead of the fileids parameter, to select which fileids should be included in the returned view.
Method __init__ Initialize this mapping based on keyword arguments, as follows:
Method categories Return a list of the categories that are defined for this corpus, or for the file(s) if it is given.
Method fileids Return a list of file identifiers for the files that make up this corpus, or that make up the given category(s) if specified.
Method _add Undocumented
Method _init Undocumented
Instance Variable _c2f Undocumented
Instance Variable _delimiter Undocumented
Instance Variable _f2c Undocumented
Instance Variable _file Undocumented
Instance Variable _map Undocumented
Instance Variable _pattern Undocumented
def __init__(self, kwargs): (source)

Initialize this mapping based on keyword arguments, as follows:

  • cat_pattern: A regular expression pattern used to find the category for each file identifier. The pattern will be applied to each file identifier, and the first matching group will be used as the category label for that file.
  • cat_map: A dictionary, mapping from file identifiers to category labels.
  • cat_file: The name of a file that contains the mapping from file identifiers to categories. The argument cat_delimiter can be used to specify a delimiter.

The corresponding argument will be deleted from kwargs. If more than one argument is specified, an exception will be raised.

def categories(self, fileids=None): (source)

Return a list of the categories that are defined for this corpus, or for the file(s) if it is given.

def fileids(self, categories=None): (source)

Return a list of file identifiers for the files that make up this corpus, or that make up the given category(s) if specified.

def _add(self, file_id, category): (source)

Undocumented

def _init(self): (source)

Undocumented

Undocumented

_delimiter = (source)

Undocumented

Undocumented

Undocumented

Undocumented

_pattern = (source)

Undocumented