class documentation

A class used to access the NLTK data server, which can be used to download corpora and other data packages.

Method __init__ Undocumented
Method clear_status_cache Undocumented
Method collections Undocumented
Method corpora Undocumented
Method default_download_dir Return the directory to which packages will be downloaded by default. This value can be overridden using the constructor, or on a case-by-case basis using the download_dir argument when calling download()...
Method download Undocumented
Method incr_download Undocumented
Method index Return the XML index describing the packages available from the data server. If necessary, this index will be downloaded from the data server.
Method info Return the Package or Collection record for the given item.
Method is_installed Undocumented
Method is_stale Undocumented
Method list Undocumented
Method models Undocumented
Method packages Undocumented
Method status Return a constant describing the status of the given package or collection. Status can be one of INSTALLED, NOT_INSTALLED, STALE, or PARTIAL.
Method update Re-download any packages whose status is STALE.
Method xmlinfo Return the XML info record for the given item
Constant DEFAULT_URL The default URL for the NLTK data server's index. An alternative URL can be specified when creating a new Downloader object.
Constant INDEX_TIMEOUT The amount of time after which the cached copy of the data server index will be considered 'stale,' and will be re-downloaded.
Constant INSTALLED A status string indicating that a package or collection is installed and up-to-date.
Constant NOT_INSTALLED A status string indicating that a package or collection is not installed.
Constant PARTIAL A status string indicating that a collection is partially installed (i.e., only some of its packages are installed.)
Constant STALE A status string indicating that a package or collection is corrupt or out-of-date.
Class Variable download_dir Undocumented
Class Variable url Undocumented
Method _download_list Undocumented
Method _download_package Undocumented
Method _get_download_dir The default directory to which packages will be downloaded. This defaults to the value returned by default_download_dir(). To override this default on a case-by-case basis, use the download_dir argument when calling ...
Method _get_url The URL for the data server's index file.
Method _info_or_id Undocumented
Method _interactive_download Undocumented
Method _num_packages Undocumented
Method _pkg_status Undocumented
Method _set_download_dir Undocumented
Method _set_url Set a new URL for the data server. If we're unable to contact the given url, then the original url is kept.
Method _update_index A helper function that ensures that self._index is up-to-date. If the index is older than self.INDEX_TIMEOUT, then download it again.
Instance Variable _collections Dictionary from collection identifier to Collection
Instance Variable _download_dir The default directory to which packages will be downloaded.
Instance Variable _errors Flag for telling if all packages got successfully downloaded or not.
Instance Variable _index The XML index file downloaded from the data server
Instance Variable _index_timestamp Time at which self._index was downloaded. If it is more than INDEX_TIMEOUT seconds old, it will be re-downloaded.
Instance Variable _packages Dictionary from package identifier to Package
Instance Variable _status_cache Dictionary from package/collection identifier to status string (INSTALLED, NOT_INSTALLED, STALE, or PARTIAL). Cache is used for packages only, not collections.
Instance Variable _url The URL for the data server's index file.
def __init__(self, server_index_url=None, download_dir=None): (source)

Undocumented

def clear_status_cache(self, id=None): (source)

Undocumented

def collections(self): (source)

Undocumented

def corpora(self): (source)

Undocumented

def default_download_dir(self): (source)

Return the directory to which packages will be downloaded by default. This value can be overridden using the constructor, or on a case-by-case basis using the download_dir argument when calling download().

On Windows, the default download directory is PYTHONHOME/lib/nltk, where PYTHONHOME is the directory containing Python, e.g. C:\Python25.

On all other platforms, the default directory is the first of the following which exists or which can be created with write permission: /usr/share/nltk_data, /usr/local/share/nltk_data, /usr/lib/nltk_data, /usr/local/lib/nltk_data, ~/nltk_data.

def download(self, info_or_id=None, download_dir=None, quiet=False, force=False, prefix='[nltk_data] ', halt_on_error=True, raise_on_error=False, print_error_to=sys.stderr): (source)

Undocumented

def incr_download(self, info_or_id, download_dir=None, force=False): (source)

Undocumented

def index(self): (source)

Return the XML index describing the packages available from the data server. If necessary, this index will be downloaded from the data server.

def info(self, id): (source)

Return the Package or Collection record for the given item.

def is_installed(self, info_or_id, download_dir=None): (source)

Undocumented

def is_stale(self, info_or_id, download_dir=None): (source)

Undocumented

def list(self, download_dir=None, show_packages=True, show_collections=True, header=True, more_prompt=False, skip_installed=False): (source)

Undocumented

def models(self): (source)

Undocumented

def packages(self): (source)

Undocumented

def status(self, info_or_id, download_dir=None): (source)

Return a constant describing the status of the given package or collection. Status can be one of INSTALLED, NOT_INSTALLED, STALE, or PARTIAL.

def update(self, quiet=False, prefix='[nltk_data] '): (source)

Re-download any packages whose status is STALE.

def xmlinfo(self, id): (source)

Return the XML info record for the given item

DEFAULT_URL: str = (source)

The default URL for the NLTK data server's index. An alternative URL can be specified when creating a new Downloader object.

Value
'https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml'
INDEX_TIMEOUT = (source)

The amount of time after which the cached copy of the data server index will be considered 'stale,' and will be re-downloaded.

Value
60 * 60
INSTALLED: str = (source)

A status string indicating that a package or collection is installed and up-to-date.

Value
'installed'
NOT_INSTALLED: str = (source)

A status string indicating that a package or collection is not installed.

Value
'not installed'
PARTIAL: str = (source)

A status string indicating that a collection is partially installed (i.e., only some of its packages are installed.)

Value
'partial'
STALE: str = (source)

A status string indicating that a package or collection is corrupt or out-of-date.

Value
'out of date'
download_dir = (source)

Undocumented

Undocumented

def _download_list(self, items, download_dir, force): (source)

Undocumented

def _download_package(self, info, download_dir, force): (source)

Undocumented

def _get_download_dir(self): (source)

The default directory to which packages will be downloaded. This defaults to the value returned by default_download_dir(). To override this default on a case-by-case basis, use the download_dir argument when calling download().

def _get_url(self): (source)

The URL for the data server's index file.

def _info_or_id(self, info_or_id): (source)

Undocumented

def _interactive_download(self): (source)

Undocumented

def _num_packages(self, item): (source)

Undocumented

def _pkg_status(self, info, filepath): (source)

Undocumented

def _set_download_dir(self, download_dir): (source)

Undocumented

def _set_url(self, url): (source)

Set a new URL for the data server. If we're unable to contact the given url, then the original url is kept.

def _update_index(self, url=None): (source)

A helper function that ensures that self._index is up-to-date. If the index is older than self.INDEX_TIMEOUT, then download it again.

_collections = (source)

Dictionary from collection identifier to Collection

_download_dir = (source)

The default directory to which packages will be downloaded.

Flag for telling if all packages got successfully downloaded or not.

The XML index file downloaded from the data server

_index_timestamp = (source)

Time at which self._index was downloaded. If it is more than INDEX_TIMEOUT seconds old, it will be re-downloaded.

_packages = (source)

Dictionary from package identifier to Package

_status_cache: dict = (source)

Dictionary from package/collection identifier to status string (INSTALLED, NOT_INSTALLED, STALE, or PARTIAL). Cache is used for packages only, not collections.

The URL for the data server's index file.