class LanguageModel: (source)
Known subclasses: nltk.lm.Lidstone
, nltk.lm.MLE
, nltk.lm.models.InterpolatedLanguageModel
Constructor: LanguageModel(order, vocabulary, counter)
ABC for Language Models.
Cannot be directly instantiated itself.
Method | __init__ |
Creates new LanguageModel. |
Method | context |
Helper method for retrieving counts for a given context. |
Method | entropy |
Calculate cross-entropy of model for given evaluation text. |
Method | fit |
Trains the model on a text. |
Method | generate |
Generate words from the model. |
Method | logscore |
Evaluate the log score of this word in this context. |
Method | perplexity |
Calculates the perplexity of the given text. |
Method | score |
Masks out of vocab (OOV) words and computes their model score. |
Method | unmasked |
Score a word given some optional context. |
Instance Variable | counts |
Undocumented |
Instance Variable | order |
Undocumented |
Instance Variable | vocab |
Undocumented |
nltk.lm.Lidstone
, nltk.lm.models.InterpolatedLanguageModel
Creates new LanguageModel.
of creating a new one when training.
:type vocabulary: nltk.lm.Vocabulary
or None
:param counter: If provided, use this object to count ngrams.
:type vocabulary: nltk.lm.NgramCounter
or None
:param ngrams_fn: If given, defines how sentences in training text are turned to ngram
sequences.
Parameters | |
order | Undocumented |
vocabulary | If provided, this vocabulary will be used instead |
counter | Undocumented |
ngrams | Undocumented |
pad | If given, defines how senteces in training text are padded. |
Helper method for retrieving counts for a given context.
Assumes context has been checked and oov words in it masked. :type context: tuple(str) or None
Calculate cross-entropy of model for given evaluation text.
Parameters | |
text | Undocumented |
A sequence of ngram tuples. | |
Returns | |
float | Undocumented |
Trains the model on a text.
Parameters | |
text | Training text as a sequence of sentences. |
vocabulary | Undocumented |
Generate words from the model.
makes the random sampling part of generation reproducible. :return: One (str) word or a list of words generated from model.
Examples:
>>> from nltk.lm import MLE >>> lm = MLE(2) >>> lm.fit([[("a", "b"), ("b", "c")]], vocabulary_text=['a', 'b', 'c']) >>> lm.fit([[("a",), ("b",), ("c",)]]) >>> lm.generate(random_seed=3) 'a' >>> lm.generate(text_seed=['a']) 'b'
Parameters | |
num | Undocumented |
text | Generation can be conditioned on preceding context. |
random | A random seed or an instance of random.Random . If provided, |
int num | How many words to generate. By default 1. |
Evaluate the log score of this word in this context.
The arguments are the same as for score
and unmasked_score
.
Calculates the perplexity of the given text.
This is simply 2 ** cross-entropy for the text, so the arguments are the same.
Masks out of vocab (OOV) words and computes their model score.
For model-specific logic of calculating scores, see the unmasked_score
method.
Score a word given some optional context.
Concrete models are expected to provide an implementation.
Note that this method does not mask its arguments with the OOV label.
Use the score
method for that.
If None
, compute unigram score.
:param context: tuple(str) or None
:rtype: float
Parameters | |
word | Undocumented |
context | Undocumented |
str word | Word for which we want the score |
tuple(str) context | Context the word is in. |