module documentation

Undocumented

Class Counter A counter that auto-increments each time its value is read.
Class Deprecated A base class used to mark deprecated classes. A typical usage is to alert users that the name of a class has changed:
Class ElementWrapper A wrapper around ElementTree Element objects whose main purpose is to provide nicer __repr__ and __str__ methods. In addition, any of the wrapped Element's methods that return other Element objects are overridden to wrap those values before returning them.
Exception ReadError Exception raised by read_* functions when they fail. :param position: The index in the input string where an error occurred. :param expected: What was expected when an error occurred.
Function config_java Configure nltk's java interface, by letting nltk know where it can find the Java binary, and what extra options (if any) should be passed to Java when it is run.
Function deprecated A decorator used to mark functions as deprecated. This will cause a warning to be printed the when the function is used. Usage:
Function find_binary Undocumented
Function find_binary_iter Search for a file to be used by nltk.
Function find_dir Undocumented
Function find_file Undocumented
Function find_file_iter Search for a file to be used by nltk.
Function find_jar Undocumented
Function find_jar_iter Search for a jar that is used by nltk.
Function find_jars_within_path Undocumented
Function import_from_stdlib When python is run from within the nltk/ directory tree, the current directory is included at the beginning of the search path. Unfortunately, that means that modules within nltk can sometimes shadow standard library modules...
Function is_writable Undocumented
Function java Execute the given java command, by opening a subprocess that calls Java. If java has not yet been configured, it will be configured by calling config_java() with no arguments.
Function overridden name in a base class. This is typically used when defining abstract base classes or interfaces, to allow subclasses to define either of two related methods:
Function raise_unorderable_types Undocumented
Function read_int If an integer begins at the specified position in the given string, then return a tuple (val, end_position) containing the value of the integer and the position where it ends. Otherwise, raise a ReadError...
Function read_number If an integer or float begins at the specified position in the given string, then return a tuple (val, end_position) containing the value of the number and the position where it ends. Otherwise, raise a ...
Function read_str If a Python string literal begins at the specified position in the given string, then return a tuple (val, end_position) containing the value of the string literal and the position where it ends. Otherwise, raise a ...
Function slice_bounds Given a slice, return the corresponding (start, stop) bounds, taking into account None indices and negative indices. The following guarantees are made for the returned start and stop values:
Variable a Undocumented
Variable b Undocumented
Function _add_epytext_field Add an epytext @field to a given object's docstring.
Function _decode_stdoutdata Convert data read from stdout/stderr to unicode
Function _mro Return the method resolution order for cls -- i.e., a list containing cls and all its base classes, in the order in which they would be checked by getattr. For new-style classes, this is just cls.__mro__...
Constant _READ_INT_RE Undocumented
Constant _READ_NUMBER_VALUE Undocumented
Constant _STRING_START_RE Undocumented
Variable _java_bin Undocumented
Variable _java_options Undocumented
def config_java(bin=None, options=None, verbose=False): (source)

Configure nltk's java interface, by letting nltk know where it can find the Java binary, and what extra options (if any) should be passed to Java when it is run.

Parameters
bin:strThe full path to the Java binary. If not specified, then nltk will search the system for a Java binary; and if one is not found, it will raise a LookupError exception.
options:list(str)A list of options that should be passed to the Java binary when it is called. A common value is '-Xmx512m', which tells Java binary to increase the maximum heap size to 512 megabytes. If no options are specified, then do not modify the options list.
verboseUndocumented
def deprecated(message): (source)

A decorator used to mark functions as deprecated. This will cause a warning to be printed the when the function is used. Usage:

>>> from nltk.internals import deprecated
>>> @deprecated('Use foo() instead')
... def bar(x):
...     print(x/10)
def find_binary(name, path_to_bin=None, env_vars=(), searchpath=(), binary_names=None, url=None, verbose=False): (source)

Undocumented

def find_binary_iter(name, path_to_bin=None, env_vars=(), searchpath=(), binary_names=None, url=None, verbose=False): (source)

Search for a file to be used by nltk.

Parameters
nameThe name or path of the file.
path_to_binThe user-supplied binary location (deprecated)
env_varsA list of environment variable names to check.
searchpathList of directories to search.
binary_namesUndocumented
urlURL presented to user for download help.
verboseWhether or not to print path when a file is found.
file_namesA list of alternative file names to check.
def find_dir(filename, env_vars=(), searchpath=(), file_names=None, url=None, verbose=False): (source)

Undocumented

def find_file(filename, env_vars=(), searchpath=(), file_names=None, url=None, verbose=False): (source)

Undocumented

def find_file_iter(filename, env_vars=(), searchpath=(), file_names=None, url=None, verbose=False, finding_dir=False): (source)

Search for a file to be used by nltk.

Parameters
filenameThe name or path of the file.
env_varsA list of environment variable names to check.
searchpathList of directories to search.
file_namesA list of alternative file names to check.
urlURL presented to user for download help.
verboseWhether or not to print path when a file is found.
finding_dirUndocumented
def find_jar(name_pattern, path_to_jar=None, env_vars=(), searchpath=(), url=None, verbose=False, is_regex=False): (source)

Undocumented

def find_jar_iter(name_pattern, path_to_jar=None, env_vars=(), searchpath=(), url=None, verbose=False, is_regex=False): (source)

Search for a jar that is used by nltk.

Parameters
name_patternThe name of the jar file
path_to_jarThe user-supplied jar location, or None.
env_varsA list of environment variable names to check in addition to the CLASSPATH variable which is checked by default.
searchpathList of directories to search.
urlUndocumented
verboseUndocumented
is_regexWhether name is a regular expression.
def find_jars_within_path(path_to_jars): (source)

Undocumented

def import_from_stdlib(module): (source)

When python is run from within the nltk/ directory tree, the current directory is included at the beginning of the search path. Unfortunately, that means that modules within nltk can sometimes shadow standard library modules. As an example, the stdlib 'inspect' module will attempt to import the stdlib 'tokenize' module, but will instead end up importing NLTK's 'tokenize' module instead (causing the import to fail).

def is_writable(path): (source)

Undocumented

def java(cmd, classpath=None, stdin=None, stdout=None, stderr=None, blocking=True): (source)

Execute the given java command, by opening a subprocess that calls Java. If java has not yet been configured, it will be configured by calling config_java() with no arguments.

Parameters
cmd:list(str)The java command that should be called, formatted as a list of strings. Typically, the first string will be the name of the java class; and the remaining strings will be arguments for that java class.
classpath:strA ':' separated list of directories, JAR archives, and ZIP archives to search for class files.
stdinUndocumented
stdoutUndocumented
stderrUndocumented
blockingIf false, then return immediately after spawning the subprocess. In this case, the return value is the Popen object, and not a (stdout, stderr) tuple.
stdin, stdout, stderrSpecify the executed programs' standard input, standard output and standard error file handles, respectively. Valid values are subprocess.PIPE, an existing file descriptor (a positive integer), an existing file object, 'pipe', 'stdout', 'devnull' and None. subprocess.PIPE indicates that a new pipe to the child should be created. With None, no redirection will occur; the child's file handles will be inherited from the parent. Additionally, stderr can be subprocess.STDOUT, which indicates that the stderr data from the applications should be captured into the same file handle as for stdout.
Returns
If blocking=True, then return a tuple (stdout, stderr), containing the stdout and stderr outputs generated by the java command if the stdout and stderr parameters were set to subprocess.PIPE; or None otherwise. If blocking=False, then return a subprocess.Popen object.
Raises
OSErrorIf the java command returns a nonzero return code.
def overridden(method): (source)

name in a base class. This is typically used when defining abstract base classes or interfaces, to allow subclasses to define either of two related methods:

>>> class EaterI:
...     '''Subclass must define eat() or batch_eat().'''
...     def eat(self, food):
...         if overridden(self.batch_eat):
...             return self.batch_eat([food])[0]
...         else:
...             raise NotImplementedError()
...     def batch_eat(self, foods):
...         return [self.eat(food) for food in foods]
Parameters
method:instance methodUndocumented
Returns
True if method overrides some method with the same
def raise_unorderable_types(ordering, a, b): (source)

Undocumented

def read_int(s, start_position): (source)

If an integer begins at the specified position in the given string, then return a tuple (val, end_position) containing the value of the integer and the position where it ends. Otherwise, raise a ReadError.

>>> from nltk.internals import read_int
>>> read_int('42 is the answer', 0)
(42, 2)
Parameters
s:strA string that will be checked to see if within which a Python integer exists.
start_position:intThe specified beginning position of the string s to begin regex matching.
Returns
tuple(int, int)A tuple containing the matched integer casted to an int, and the end position of the int in s.
Raises
ReadErrorIf the _READ_INT_RE regex doesn't return a match in s at start_position.
Unknown Field: example
def read_number(s, start_position): (source)

If an integer or float begins at the specified position in the given string, then return a tuple (val, end_position) containing the value of the number and the position where it ends. Otherwise, raise a ReadError.

>>> from nltk.internals import read_number
>>> read_number('Pi is 3.14159', 6)
(3.14159, 13)
Parameters
s:strA string that will be checked to see if within which a Python number exists.
start_position:intThe specified beginning position of the string s to begin regex matching.
Returns
tuple(float, int)A tuple containing the matched number casted to a float, and the end position of the number in s.
Raises
ReadErrorIf the _READ_NUMBER_VALUE regex doesn't return a match in s at start_position.
Unknown Field: example
def read_str(s, start_position): (source)

If a Python string literal begins at the specified position in the given string, then return a tuple (val, end_position) containing the value of the string literal and the position where it ends. Otherwise, raise a ReadError.

>>> from nltk.internals import read_str
>>> read_str('"Hello", World!', 0)
('Hello', 7)
Parameters
s:strA string that will be checked to see if within which a Python string literal exists.
start_position:intThe specified beginning position of the string s to begin regex matching.
Returns
tuple(str, int)A tuple containing the matched string literal evaluated as a string and the end position of the string literal.
Raises
ReadErrorIf the _STRING_START_RE regex doesn't return a match in s at start_position, i.e., open quote. If the _STRING_END_RE regex doesn't return a match in s at the end of the first match, i.e., close quote.
ValueErrorIf an invalid string (i.e., contains an invalid escape sequence) is passed into the eval.
Unknown Field: example
def slice_bounds(sequence, slice_obj, allow_step=False): (source)

Given a slice, return the corresponding (start, stop) bounds, taking into account None indices and negative indices. The following guarantees are made for the returned start and stop values:

  • 0 <= start <= len(sequence)
  • 0 <= stop <= len(sequence)
  • start <= stop
Parameters
sequenceUndocumented
slice_objUndocumented
allow_stepIf true, then the slice object may have a non-None step. If it does, then return a tuple (start, stop, step).
Raises
ValueErrorIf slice_obj.step is not None.

Undocumented

Undocumented

def _add_epytext_field(obj, field, message): (source)

Add an epytext @field to a given object's docstring.

def _decode_stdoutdata(stdoutdata): (source)

Convert data read from stdout/stderr to unicode

def _mro(cls): (source)

Return the method resolution order for cls -- i.e., a list containing cls and all its base classes, in the order in which they would be checked by getattr. For new-style classes, this is just cls.__mro__. For classic classes, this can be obtained by a depth-first left-to-right traversal of __bases__.

_READ_INT_RE = (source)

Undocumented

Value
re.compile(r'-?\d+')
_READ_NUMBER_VALUE = (source)

Undocumented

Value
re.compile(r'-?(\d*)(\.?\d*)?')
_STRING_START_RE = (source)

Undocumented

Value
re.compile(r'[uU]?[rR]?("""|\'\'\'|"|\')')
_java_bin = (source)

Undocumented

_java_options: list = (source)

Undocumented