class documentation
class StandardFormat(object): (source)
Known subclasses: nltk.toolbox.ToolboxData
, nltk.toolbox.ToolboxSettings
Constructor: StandardFormat(filename, encoding)
Class for reading and processing standard format marker files and strings.
Method | __init__ |
Undocumented |
Method | close |
Close a previously opened standard format marker file or string. |
Method | fields |
Return an iterator that returns the next field in a (marker, value) tuple, where marker and value are unicode strings if an encoding was specified in the fields() method. Otherwise they are non-unicode strings. |
Method | open |
Open a standard format marker file for sequential reading. |
Method | open |
Open a standard format marker string for sequential reading. |
Method | raw |
Return an iterator that returns the next field in a (marker, value) tuple. Linebreaks and trailing white space are preserved except for the final newline in each field. |
Instance Variable | line |
Undocumented |
Instance Variable | _encoding |
Undocumented |
Instance Variable | _file |
Undocumented |
def fields(self, strip=True, unwrap=True, encoding=None, errors='strict', unicode_fields=None):
(source)
¶
Return an iterator that returns the next field in a (marker, value) tuple, where marker and value are unicode strings if an encoding was specified in the fields() method. Otherwise they are non-unicode strings.
Parameters | |
strip:bool | strip trailing whitespace from the last line of each field |
unwrap:bool | Convert newlines in a field to spaces. |
encoding:str or None | Name of an encoding to use. If it is specified then the fields() method returns unicode strings rather than non unicode strings. |
errors:str | Error handling scheme for codec. Same as the decode() builtin string method. |
unicode | Set of marker names whose values are UTF-8 encoded. Ignored if encoding is None. If the whole file is UTF-8 encoded set encoding='utf8' and leave unicode_fields with its default value of None. |
Returns | |
iter(tuple(str, str)) | Undocumented |
Open a standard format marker file for sequential reading.
Parameters | |
sfm | name of the standard format marker input file |
Open a standard format marker string for sequential reading.
Parameters | |
s:str | string to parse as a standard format marker input file |