nltk-3.6.2 API Documentation Modules Classes Names
Clear Help

For more information on the search, visit the help page.

Module Index

  • nltk - The Natural Language Toolkit (NLTK) is an open source Python library for Natural Language Processing. A free online book is available. (If you use the library for academic research, please cite the book...
    • app - Interactive NLTK Applications:
      • chartparser_app - A graphical tool for exploring chart parsing.
      • chunkparser_app - A graphical tool for exploring the regular expression based chunk parser nltk.chunk.RegexpChunkParser.
      • collocations_app - Undocumented
      • concordance_app - Undocumented
      • nemo_app - Finding (and Replacing) Nemo
      • rdparser_app - A graphical tool for exploring the recursive descent parser.
      • srparser_app - A graphical tool for exploring the shift-reduce parser.
      • wordfreq_app - Undocumented
      • wordnet_app - A WordNet Browser application which launches the default browser (if it is not already running) and opens a new tab with a connection to http://localhost:port/ . It also starts an HTTP server on the specified port and begins serving browser requests...
    • book - Undocumented
    • ccg - Combinatory Categorial Grammar.
      • api - No module docstring; 5/5 classes documented
      • chart - The lexicon is constructed by calling lexicon.fromstring(<lexicon string>).
      • combinator - CCG Combinators
      • lexicon - CCG Lexicons
      • logic - Helper functions for CCG semantics computation
    • chat - A class for simple chatbots. These perform simple pattern matching on sentences typed by users, and respond with automatically generated sentences.
      • eliza - Undocumented
      • iesha - This chatbot is a tongue-in-cheek take on the average teen anime junky that frequents YahooMessenger or MSNM. All spelling mistakes and flawed grammar are intentional.
      • rude - Undocumented
      • suntsu - Tsu bot responds to all queries with a Sun Tsu sayings
      • util - Undocumented
      • zen - Zen Chatbot talks in gems of Zen wisdom.
    • chunk - Classes and interfaces for identifying non-overlapping linguistic groups (such as base noun phrases) in unrestricted text. This task is called "chunk parsing" or "chunking", and the identified groups are called "chunks"...
      • api - No module docstring; 1/1 class documented
      • named_entity - Named entity chunker
      • regexp - No module docstring; 0/1 constant, 3/3 functions, 12/12 classes documented
      • util - No module docstring; 0/3 constant, 7/10 functions, 1/1 class documented
    • classify - Classes and interfaces for labeling tokens with category labels (or "class labels"). Typically, labels are represented with strings (such as 'health' or 'sports'). Classifiers can be used to perform a wide range of classification tasks...
      • api - Interfaces for labeling tokens with category labels (or "class labels").
      • decisiontree - A classifier model that decides which label to assign to a token on the basis of a tree structure, where branches correspond to conditions on feature values, and leaves correspond to label assignments.
      • maxent - A classifier model based on maximum entropy modeling framework. This framework considers all of the probability distributions that are empirically consistent with the training data; and chooses the distribution with the highest entropy...
      • megam - A set of functions used to interface with the external megam maxent optimization package. Before megam can be used, you should tell NLTK where it can find the megam binary, using the config_megam() function...
      • naivebayes - A classifier based on the Naive Bayes algorithm. In order to find the probability for a label, this algorithm first uses the Bayes rule to express P(label|features) in terms of P(label) and P(features|label):...
      • positivenaivebayes - A variant of the Naive Bayes Classifier that performs binary classification with partially-labeled training sets. In other words, assume we want to build a classifier that assigns each example to one of two complementary classes (e...
      • rte_classify - Simple classifier for RTE corpus.
      • scikitlearn - scikit-learn (http://scikit-learn.org) is a machine learning library for Python. It supports many classification algorithms, including SVMs, Naive Bayes, logistic regression (MaxEnt) and decision trees.
      • senna - A general interface to the SENNA pipeline that supports any of the operations specified in SUPPORTED_OPERATIONS.
      • svm - nltk.classify.svm was deprecated. For classification based on support vector machines SVMs use nltk.classify.scikitlearn (or scikit-learn directly).
      • tadm - No module docstring; 0/1 variable, 3/6 functions documented
      • textcat - A module for language identification using the TextCat algorithm. An implementation of the text categorization algorithm presented in Cavnar, W. B. and J. M. Trenkle, "N-Gram-Based Text Categorization".
      • util - Utility functions and classes for classifiers.
      • weka - Classifiers that make use of the external 'Weka' package.
    • cli - No module docstring; 0/1 constant, 1/2 function documented
    • cluster - This module contains a number of basic clustering algorithms. Clustering describes the task of discovering groups of similar items with a large collection. It is also describe as unsupervised machine learning, as the data from which it learns is unannotated with class information, as is the case for supervised learning...
      • api - No module docstring; 1/1 class documented
      • em - No module docstring; 1/1 function, 1/1 class documented
      • gaac - No module docstring; 1/1 function, 1/1 class documented
      • kmeans - No module docstring; 0/1 function, 1/1 class documented
      • util - No module docstring; 2/2 functions, 3/3 classes documented
    • collections - No module docstring; 8/9 classes documented
    • collocations - Tools to identify collocations --- words that often appear consecutively --- within corpora. They may also be used to find other associations between word occurrences. See Manning and Schutze ch. 5 at ...
    • compat - Undocumented
    • corpus - NLTK corpus readers. The modules in this package provide functions that can be used to read corpus files in a variety of formats. These functions can be used to read both the corpus files that are distributed in the NLTK corpus package, and corpus files that are part of external corpora.
      • europarl_raw - Undocumented
      • reader - NLTK corpus readers. The modules in this package provide functions that can be used to read corpus fileids in a variety of formats. These functions can be used to read both the corpus fileids that are distributed in the NLTK corpus package, and corpus fileids that are part of external corpora.
        • aligned - No module docstring; 1/1 class documented
        • api - API for corpus readers.
        • bnc - Corpus reader for the XML version of the British National Corpus.
        • bracket_parse - Corpus reader for corpora that consist of parenthesis-delineated parse trees.
        • categorized_sents - CorpusReader structured for corpora that contain one instance on each row. This CorpusReader is specifically used for the Subjectivity Dataset and the Sentence Polarity Dataset.
        • chasen - No module docstring; 0/2 function, 1/1 class documented
        • childes - Corpus reader for the XML version of the CHILDES corpus.
        • chunked - A reader for corpora that contain chunked (and optionally tagged) documents.
        • cmudict - The Carnegie Mellon Pronouncing Dictionary [cmudict.0.6] ftp://ftp.cs.cmu.edu/project/speech/dict/ Copyright 1998 Carnegie Mellon University
        • comparative_sents - CorpusReader for the Comparative Sentence Dataset.
        • conll - Read CoNLL-style chunk fileids.
        • crubadan - An NLTK interface for the n-gram statistics gathered from the corpora for each language using An Crubadan.
        • dependency - Undocumented
        • framenet - Corpus reader for the FrameNet 1.7 lexicon and corpus.
        • ieer - Corpus reader for the Information Extraction and Entity Recognition Corpus.
        • indian - Indian Language POS-Tagged Corpus Collected by A Kumaran, Microsoft Research, India Distributed with permission
        • ipipan - Undocumented
        • knbc - Undocumented
        • lin - Undocumented
        • mte - A reader for corpora whose documents are in MTE format.
        • nkjp - No module docstring; 1/1 function, 4/5 classes documented
        • nombank - No module docstring; 2/5 classes documented
        • nps_chat - Undocumented
        • opinion_lexicon - CorpusReader for the Opinion Lexicon.
        • panlex_lite - CorpusReader for PanLex Lite, a stripped down version of PanLex distributed as an SQLite database. See the README.txt in the panlex_lite corpus directory for more information on PanLex Lite.
        • panlex_swadesh - Undocumented
        • pl196x - Undocumented
        • plaintext - A reader for corpora that consist of plaintext documents.
        • ppattach - Read lines from the Prepositional Phrase Attachment Corpus.
        • propbank - No module docstring; 2/6 classes documented
        • pros_cons - CorpusReader for the Pros and Cons dataset.
        • reviews - CorpusReader for reviews corpora (syntax based on Customer Review Corpus).
        • rte - Corpus reader for the Recognizing Textual Entailment (RTE) Challenge Corpora.
        • semcor - Corpus reader for the SemCor Corpus.
        • senseval - Read from the Senseval 2 Corpus.
        • sentiwordnet - An NLTK interface for SentiWordNet
        • sinica_treebank - Sinica Treebank Corpus Sample
        • string_category - Read tuples from a corpus consisting of categorized strings. For example, from the question classification corpus:
        • switchboard - No module docstring; 1/1 class documented
        • tagged - A reader for corpora whose documents contain part-of-speech-tagged words.
        • timit - Read tokens, phonemes and audio data from the NLTK TIMIT Corpus.
        • toolbox - Module for reading, writing and manipulating Toolbox databases and settings fileids.
        • twitter - A reader for corpora that consist of Tweets. It is assumed that the Tweets have been serialised into line-delimited JSON.
        • udhr - UDHR corpus reader. It mostly deals with encodings.
        • util - No module docstring; 4/11 functions, 3/3 classes documented
        • verbnet - An NLTK interface to the VerbNet verb lexicon
        • wordlist - Undocumented
        • wordnet - An NLTK interface for WordNet
        • xmldocs - Corpus reader for corpora whose documents are xml files.
        • ycoe - Corpus reader for the York-Toronto-Helsinki Parsed Corpus of Old English Prose (YCOE), a 1.5 million word syntactically-annotated corpus of Old English prose texts. The corpus is distributed by the Oxford Text Archive: ...
      • util - No module docstring; 0/1 constant, 1/1 function, 1/1 class documented
    • data - Functions to find and load NLTK resource files, such as corpora, grammars, and saved processing objects. Resource files are identified using URLs, such as nltk:corpora/abc/rural.txt or http://nltk.org/sample/toy.cfg...
    • decorators - Decorator module by Michele Simionato <michelesimionato@libero.it> Copyright Michele Simionato, distributed under the terms of the BSD License (see below). http://www.phyast.pitt.edu/~micheles/python/documentation.html...
    • downloader - The NLTK corpus and module downloader. This module defines several interfaces which can be used to download corpora, models, and other data packages that can be used with NLTK.
    • draw - No package docstring; 5/5 modules documented
      • cfg - Visualization tools for CFGs.
      • dispersion - A utility for displaying lexical dispersion.
      • table - Tkinter widgets for displaying multi-column listboxes and tables.
      • tree - Graphically display a Tree.
      • util - Tools for graphically displaying and interacting with the objects and processing classes defined by the Toolkit. These tools are primarily intended to help students visualize the objects that they create.
    • featstruct - Basic data classes for representing feature structures, and for performing basic operations on those feature structures. A feature structure is a mapping from feature identifiers to feature values, where each feature value is either a basic value (such as a string or an integer), or a nested feature structure...
    • grammar - Basic data classes for representing context free grammars. A "grammar" specifies which trees can represent the structure of a given text. Each of these trees is called a "parse tree" for the text (or simply a "parse")...
    • help - Provide structured access to documentation.
    • inference - Classes and interfaces for theorem proving and model building.
      • api - Interfaces and base classes for theorem provers and model builders.
      • discourse - Module for incrementally developing simple discourses, and checking for semantic ambiguity, consistency and informativeness.
      • mace - A model builder that makes use of the external 'Mace4' package.
      • nonmonotonic - A module to perform nonmonotonic reasoning. The ideas and demonstrations in this module are based on "Logical Foundations of Artificial Intelligence" by Michael R. Genesereth and Nils J. Nilsson.
      • prover9 - A theorem prover that makes use of the external 'Prover9' package.
      • resolution - Module for a resolution-based First Order theorem prover.
      • tableau - Module for a tableau-based First Order theorem prover.
    • internals - No module docstring; 0/4 variable, 0/3 constant, 15/22 functions, 1/1 exception, 3/3 classes documented
    • jsontags - Register JSON tags, so the nltk data loader knows what module and class to look for.
    • lazyimport - Helper to enable simple lazy module import.
    • lm - Currently this module covers only ngram language models, but it should be easy to extend to neural models.
      • api - Language Model Interface.
      • counter - No summary
      • models - Language Models
      • preprocessing - No module docstring; 1/1 variable, 2/2 functions documented
      • smoothing - Smoothing algorithms for language modeling.
      • util - Language Model Utilities
      • vocabulary - Language Model Vocabulary
    • metrics - NLTK Metrics
      • agreement - Implementations of inter-annotator agreement coefficients surveyed by Artstein and Poesio (2007), Inter-Coder Agreement for Computational Linguistics.
      • aline - ALINE http://webdocs.cs.ualberta.ca/~kondrak/ Copyright 2002 by Grzegorz Kondrak.
      • association - Provides scoring functions for a number of association measures through a generic, abstract implementation in NgramAssocMeasures, and n-specific BigramAssocMeasures and TrigramAssocMeasures.
      • confusionmatrix - No module docstring; 0/1 function, 1/1 class documented
      • distance - Distance Metrics.
      • paice - Counts Paice's performance statistics for evaluating stemming algorithms.
      • scores - No module docstring; 6/7 functions documented
      • segmentation - Text Segmentation Metrics
      • spearman - Tools for comparing ranked lists.
    • misc - No package docstring; 3/5 modules documented
      • babelfish - This module previously provided an interface to Babelfish online translation service; this service is no longer available; this module is kept in NLTK source code in order to provide better error messages for people following the NLTK Book 2...
      • chomsky - CHOMSKY is an aid to writing linguistic papers in the style of the great master. It is based on selected phrases taken from actual books and articles written by Noam Chomsky. Upon request, it assembles the phrases in the elegant stylistic patterns that Chomsky is noted for...
      • minimalset - No module docstring; 1/1 class documented
      • sort - This module provides a variety of list sorting algorithms, to illustrate the many different algorithms (recipes) for solving a problem, and how to analyze algorithms experimentally.
      • wordfinder - No module docstring; 1/5 function documented
    • parse - NLTK Parsers
      • api - No module docstring; 1/1 class documented
      • bllip - No module docstring; 1/4 function, 1/1 class documented
      • chart - Data classes and parser implementations for "chart parsers", which use dynamic programming to efficiently parse a text. A chart parser derives parse trees for a text by iteratively adding "edges" to a "chart...
      • corenlp - No module docstring; 0/1 variable, 0/2 function, 1/1 exception, 3/4 classes documented
      • dependencygraph - Tools for reading and writing dependency trees. The input is assumed to be in Malt-TAB format (http://stp.lingfil.uu.se/~nivre/research/MaltXML.html).
      • earleychart - Data classes and parser implementations for incremental chart parsers, which use dynamic programming to efficiently parse a text. A "chart parser" derives parse trees for a text by iteratively adding "edges" to a "chart"...
      • evaluate - No module docstring; 1/1 class documented
      • featurechart - Extension of chart parsing implementation to handle grammars with feature structures as nodes.
      • generate - No module docstring; 0/1 variable, 1/4 function documented
      • malt - No module docstring; 2/3 functions, 1/1 class documented
      • nonprojectivedependencyparser - No module docstring; 0/1 variable, 0/4 function, 4/5 classes documented
      • pchart - Classes and interfaces for associating probabilities with tree structures that represent the internal organization of a text. The probabilistic parser module defines BottomUpProbabilisticChartParser.
      • projectivedependencyparser - No module docstring; 3/4 functions, 4/4 classes documented
      • recursivedescent - No module docstring; 1/1 function, 2/2 classes documented
      • shiftreduce - No module docstring; 1/1 function, 2/2 classes documented
      • stanford - No module docstring; 0/1 variable, 4/4 classes documented
      • transitionparser - No module docstring; 1/1 function, 3/3 classes documented
      • util - Utility functions for parsers.
      • viterbi - No module docstring; 1/1 function, 1/1 class documented
    • probability - Classes for representing and processing probabilistic information.
    • sem - NLTK Semantic Interpretation Package
      • boxer - An interface to Boxer.
      • chat80 - Chat-80 was a natural language system which allowed the user to interrogate a Prolog knowledge base in the domain of world geography. It was developed in the early '80s by Warren and Pereira; see http://www.aclweb.org/anthology/J82-3002.pdf...
      • cooper_storage - No module docstring; 1/2 function, 1/1 class documented
      • drt - No module docstring; 1/5 function, 0/1 exception, 4/20 classes documented
      • drt_glue_demo - Undocumented
      • evaluate - This module provides data structures for representing first-order models.
      • glue - Undocumented
      • hole - An implementation of the Hole Semantics model, following Blackburn and Bos, Representation and Inference for Natural Language (CSLI, 2005).
      • lfg - Undocumented
      • linearlogic - No module docstring; 0/1 variable, 0/1 function, 0/3 exception, 1/9 class documented
      • logic - A version of first order predicate logic, built on top of the typed lambda calculus.
      • relextract - Code for extracting relational triples from the ieer and conll2002 corpora.
      • skolemize - No module docstring; 2/2 functions documented
      • util - Utility functions for batch-processing sentences: parsing and extraction of the semantic representation of the root node of the the syntax tree, followed by evaluation of the semantic representation in a first-order model.
    • sentiment - NLTK Sentiment Analysis Package
      • sentiment_analyzer - A SentimentAnalyzer is a tool to implement and facilitate Sentiment Analysis tasks using NLTK features and classifiers, especially for teaching and demonstrative purposes.
      • util - Utility methods for Sentiment Analysis.
      • vader - If you use the VADER sentiment analysis tools, please cite:
    • stem - NLTK Stemmers
      • api - No module docstring; 1/1 class documented
      • arlstem - ARLSTem Arabic Stemmer The details about the implementation of this algorithm are described in: K. Abainia, S. Ouamour and H. Sayoud, A Novel Robust Arabic Light Stemmer , Journal of Experimental & Theoretical Artificial Intelligence (JETAI'17), Vol...
      • arlstem2 - ARLSTem2 Arabic Light Stemmer The details about the implementation of this algorithm are described in: K. Abainia and H. Rebbani, Comparing the Effectiveness of the Improved ARLSTem Algorithm with Existing Arabic Light Stemmers, International Conference on Theoretical and Applicative Aspects of Computer Science (ICTAACS'19), Skikda, Algeria, December 15-16, 2019...
      • cistem - No module docstring; 1/1 class documented
      • isri - ISRI Arabic Stemmer
      • lancaster - A word stemmer based on the Lancaster (Paice/Husk) stemming algorithm. Paice, Chris D. "Another Stemmer." ACM SIGIR Forum 24.3 (1990): 56-61.
      • porter - Porter Stemmer
      • regexp - No module docstring; 1/1 class documented
      • rslp - No module docstring; 1/1 class documented
      • snowball - Snowball stemmers
      • util - No module docstring; 2/2 functions documented
      • wordnet - No module docstring; 1/1 class documented
    • tag - NLTK Taggers
      • api - Interface for tagging each token in a sentence with supplementary information, such as its part of speech.
      • brill - No module docstring; 5/5 functions, 3/3 classes documented
      • brill_trainer - No module docstring; 1/1 class documented
      • crf - A module for POS tagging using CRFSuite
      • hmm - Hidden Markov Models (HMMs) largely used to assign the correct label sequence to sequential data or assess the probability of a given label and data sequence. These models are finite state machines characterised by a number of states, transitions between these states, and output symbols emitted while in each state...
      • hunpos - A module for interfacing with the HunPos open-source POS-tagger.
      • mapping - Interface for converting POS tags from various treebanks to the universal tagset of Petrov, Das, & McDonald.
      • perceptron - No module docstring; 0/1 constant, 0/3 function, 2/2 classes documented
      • senna - Senna POS tagger, NER Tagger, Chunk Tagger
      • sequential - Classes for tagging sentences sequentially, left to right. The abstract base class SequentialBackoffTagger serves as the base class for all the taggers in this module. Tagging of individual words is performed by the method ...
      • stanford - A module for interfacing with the Stanford taggers.
      • tnt - Implementation of 'TnT - A Statisical Part of Speech Tagger' by Thorsten Brants
      • util - No module docstring; 3/3 functions documented
    • tbl - Transformation Based Learning
      • api - Undocumented
      • demo - No module docstring; 0/2 constant, 13/16 functions documented
      • erroranalysis - No module docstring; 1/1 function documented
      • feature - No module docstring; 1/1 class documented
      • rule - No module docstring; 2/2 classes documented
      • template - No module docstring; 2/2 classes documented
    • test - Unit tests for the NLTK modules. These tests are intended to ensure that source code changes don't accidentally introduce bugs. For instructions, please see:
      • all - Test suite that runs all NLTK tests.
      • childes_fixt - Undocumented
      • classify_fixt - Undocumented
      • conftest - No module docstring; 2/2 functions documented
      • discourse_fixt - Undocumented
      • gensim_fixt - Undocumented
      • gluesemantics_malt_fixt - Undocumented
      • inference_fixt - Undocumented
      • nonmonotonic_fixt - Undocumented
      • portuguese_en_fixt - Undocumented
      • probability_fixt - Undocumented
      • unit - No package docstring; 14/32 modules, 0/2 package documented
        • lm - Undocumented
          • test_counter - No module docstring; 1/2 class documented
          • test_models - No module docstring; 0/1 function, 6/9 classes documented
          • test_preprocessing - Undocumented
          • test_vocabulary - No module docstring; 1/1 class documented
        • test_aline - Unit tests for nltk.metrics.aline
        • test_brill - Tests for Brill tagger.
        • test_cfd_mutation - Undocumented
        • test_cfg2chomsky - Undocumented
        • test_chunk - Undocumented
        • test_classify - Unit tests for nltk.classify. See also: nltk/test/classify.doctest
        • test_collocations - No module docstring; 0/1 constant, 1/1 function, 0/1 class documented
        • test_concordance - No module docstring; 0/1 function, 1/1 class documented
        • test_corenlp - Mock test for Stanford CoreNLP wrappers.
        • test_corpora - Undocumented
        • test_corpus_views - Corpus View Regression Tests
        • test_data - Undocumented
        • test_disagreement - No module docstring; 1/1 class documented
        • test_freqdist - Undocumented
        • test_hmm - Undocumented
        • test_json2csv_corpus - Regression tests for json2csv() and json2csv_entities() in Twitter package.
        • test_json_serialization - Undocumented
        • test_metrics - Undocumented
        • test_naivebayes - Undocumented
        • test_nombank - Unit tests for nltk.corpus.nombank
        • test_pl196x - Undocumented
        • test_pos_tag - Tests for nltk.pos_tag
        • test_rte_classify - Undocumented
        • test_seekable_unicode_stream_reader - Undocumented
        • test_senna - Unit tests for Senna
        • test_stem - Undocumented
        • test_tag - Undocumented
        • test_tgrep - Unit tests for nltk.tgrep.
        • test_tokenize - Unit tests for nltk.tokenize. See also nltk/test/tokenize.doctest
        • test_twitter_auth - Tests for static parts of Twitter package
        • test_util - Unit tests for nltk.util.
        • test_wordnet - Unit tests for nltk.corpus.wordnet See also nltk/test/wordnet.doctest
        • translate - No package docstring; 10/11 modules documented
          • test_bleu - Tests for BLEU translation evaluation metric
          • test_gdfa - Tests GDFA alignments
          • test_ibm1 - Tests for IBM Model 1 training methods
          • test_ibm2 - Tests for IBM Model 2 training methods
          • test_ibm3 - Tests for IBM Model 3 training methods
          • test_ibm4 - Tests for IBM Model 4 training methods
          • test_ibm5 - Tests for IBM Model 5 training methods
          • test_ibm_model - Tests for common methods of IBM translation models
          • test_meteor - Undocumented
          • test_nist - Tests for NIST translation evaluation metric
          • test_stack_decoder - Tests for stack decoder
    • text - This module brings together a variety of NLTK functionality for text analysis, and provides simple, interactive interfaces. Functionality includes: concordancing, collocation discovery, regular expression search over tokenized strings, and distributional similarity.
    • tgrep - This module supports TGrep2 syntax for matching parts of NLTK Trees. Note that many tgrep operators require the tree passed to be a ParentedTree.
    • tokenize - NLTK Tokenizer Package
      • api - Tokenizer Interface
      • casual - Twitter-aware tokenizer, designed to be flexible and easy to adapt to new domains and tasks. The basic logic is this:
      • destructive - No module docstring; 2/2 classes documented
      • legality_principle - The Legality Principle is a language agnostic principle maintaining that syllable onsets and codas (the beginning and ends of syllables not including the vowel) are only legal if they are found as word onsets or codas in the language...
      • mwe - Multi-Word Expression Tokenizer
      • nist - This is a NLTK port of the tokenizer used in the NIST BLEU evaluation script, https://github.com/moses-smt/mosesdecoder/blob/master/scripts/generic/mteval-v14.pl#L926 which was also ported into Python in ...
      • punkt - Punkt Sentence Tokenizer
      • regexp - Regular-Expression Tokenizers
      • repp - No module docstring; 1/1 class documented
      • sexpr - S-Expression Tokenizer
      • simple - Simple Tokenizers
      • sonority_sequencing - The Sonority Sequencing Principle (SSP) is a language agnostic algorithm proposed by Otto Jesperson in 1904. The sonorous quality of a phoneme is judged by the openness of the lips. Syllable breaks occur before troughs in sonority...
      • stanford - No module docstring; 0/1 variable, 1/1 class documented
      • stanford_segmenter - No module docstring; 0/1 variable, 1/1 class documented
      • texttiling - No module docstring; 0/4 variable, 0/1 constant, 1/2 function, 3/3 classes documented
      • toktok - The tok-tok tokenizer is a simple, general tokenizer, where the input has one sentence per line; thus only final period is tokenized.
      • treebank - Penn Treebank Tokenizer
      • util - No module docstring; 7/7 functions, 1/1 class documented
    • toolbox - Module for reading, writing and manipulating Toolbox databases and settings files.
    • translate - Experimental features for machine translation. These interfaces are prone to change.
      • api - No module docstring; 0/1 variable, 1/3 function, 3/3 classes documented
      • bleu_score - BLEU score implementation.
      • chrf_score - ChrF score implementation
      • gale_church - A port of the Gale-Church Aligner.
      • gdfa - No module docstring; 1/1 function documented
      • gleu_score - GLEU score implementation.
      • ibm1 - Lexical translation model that ignores word order.
      • ibm2 - Lexical translation model that considers word order.
      • ibm3 - Translation model that considers how a word can be aligned to multiple words in another language.
      • ibm4 - Translation model that reorders output words based on their type and distance from other related words in the output sentence.
      • ibm5 - Translation model that keeps track of vacant positions in the target sentence to decide where to place translated words.
      • ibm_model - Common methods and classes for all IBM models. See IBMModel1, IBMModel2, IBMModel3, IBMModel4, and IBMModel5 for specific implementations.
      • meteor_score - No module docstring; 12/12 functions documented
      • metrics - No module docstring; 1/1 function documented
      • nist_score - NIST score implementation.
      • phrase_based - No module docstring; 2/2 functions documented
      • ribes_score - RIBES score implementation
      • stack_decoder - A decoder that uses stacks to implement phrase-based translation.
    • tree - Class for representing hierarchical language structures, such as syntax trees and morphological trees.
    • treeprettyprinter - Pretty-printing of discontinuous trees. Adapted from the disco-dop project, by Andreas van Cranenburgh. https://github.com/andreasvc/disco-dop
    • treetransforms - A collection of methods for tree (grammar) transformations used in parsing natural language.
    • twitter - NLTK Twitter Package
      • api - This module provides an interface for TweetHandlers, and support for timezone handling.
      • common - Utility functions for the :module:`twitterclient` module which do not require the twython library to have been installed.
      • twitter_demo - Examples to demo the twitterclient code.
      • twitterclient - NLTK Twitter client
      • util - Authentication utilities to accompany :module:`twitterclient`.
    • util - No module docstring; 26/34 functions, 0/1 class documented
    • wsd - No module docstring; 1/1 function documented
API Documentation for nltk-3.6.2, generated by pydoctor 24.11.2 at 2025-02-28 00:02:28.