class CategorizedCorpusReader(object): (source)
Known subclasses: nltk.corpus.reader.CategorizedBracketParseCorpusReader
, nltk.corpus.reader.CategorizedPlaintextCorpusReader
, nltk.corpus.reader.CategorizedSentencesCorpusReader
, nltk.corpus.reader.CategorizedTaggedCorpusReader
, nltk.corpus.reader.Pl196xCorpusReader
, nltk.corpus.reader.ProsConsCorpusReader
Constructor: CategorizedCorpusReader(kwargs)
A mixin class used to aid in the implementation of corpus readers for categorized corpora. This class defines the method categories(), which returns a list of the categories for the corpus or for a specified set of fileids; and overrides fileids() to take a categories argument, restricting the set of fileids to be returned.
Subclasses are expected to:
- Call __init__() to set up the mapping.
- Override all view methods to accept a categories parameter, which can be used instead of the fileids parameter, to select which fileids should be included in the returned view.
Method | __init__ |
Initialize this mapping based on keyword arguments, as follows: |
Method | categories |
Return a list of the categories that are defined for this corpus, or for the file(s) if it is given. |
Method | fileids |
Return a list of file identifiers for the files that make up this corpus, or that make up the given category(s) if specified. |
Method | _add |
Undocumented |
Method | _init |
Undocumented |
Instance Variable | _c2f |
Undocumented |
Instance Variable | _delimiter |
Undocumented |
Instance Variable | _f2c |
Undocumented |
Instance Variable | _file |
Undocumented |
Instance Variable | _map |
Undocumented |
Instance Variable | _pattern |
Undocumented |
nltk.corpus.reader.CategorizedBracketParseCorpusReader
, nltk.corpus.reader.CategorizedPlaintextCorpusReader
, nltk.corpus.reader.CategorizedSentencesCorpusReader
, nltk.corpus.reader.CategorizedTaggedCorpusReader
, nltk.corpus.reader.Pl196xCorpusReader
, nltk.corpus.reader.ProsConsCorpusReader
Initialize this mapping based on keyword arguments, as follows:
- cat_pattern: A regular expression pattern used to find the category for each file identifier. The pattern will be applied to each file identifier, and the first matching group will be used as the category label for that file.
- cat_map: A dictionary, mapping from file identifiers to category labels.
- cat_file: The name of a file that contains the mapping from file identifiers to categories. The argument cat_delimiter can be used to specify a delimiter.
The corresponding argument will be deleted from kwargs. If more than one argument is specified, an exception will be raised.
Return a list of the categories that are defined for this corpus, or for the file(s) if it is given.