class documentation

A probabilistic context-free grammar. A PCFG consists of a start state and a set of productions with probabilities. The set of terminals and nonterminals is implicitly specified by the productions.

PCFG productions use the ProbabilisticProduction class. PCFGs impose the constraint that the set of productions with any given left-hand-side must have probabilities that sum to 1 (allowing for a small margin of error).

If you need efficient key-based access to productions, you can use a subclass to implement it.

Class Method fromstring Return a probabilistic context-free grammar corresponding to the input string(s).
Method __init__ Create a new context-free grammar, from the given start state and set of ProbabilisticProductions.
Constant EPSILON The acceptable margin of error for checking that productions with a given left-hand side have probabilities that sum to 1.

Inherited from CFG:

Class Method binarize Convert all non-binary rules into binary by introducing new tokens. Example:: Original:
Class Method eliminate_start Eliminate start rule in case it appears on RHS Example: S -> S0 S1 and S0 -> S1 S Then another rule S0_Sigma -> S is added
Class Method remove_unitary_rules Remove nonlexical unitary rules and convert them to lexical
Method __repr__ Undocumented
Method __str__ Undocumented
Method check_coverage Check whether the grammar rules cover the given list of tokens. If not, then raise an exception.
Method chomsky_normal_form Returns a new Grammer that is in chomsky normal :param: new_token_padding
Method is_binarised Return True if all productions are at most binary. Note that there can still be empty and unary productions.
Method is_chomsky_normal_form Return True if the grammar is of Chomsky Normal Form, i.e. all productions are of the form A -> B C, or A -> "s".
Method is_flexible_chomsky_normal_form Return True if all productions are of the forms A -> B C, A -> B, or A -> "s".
Method is_leftcorner True if left is a leftcorner of cat, where left can be a terminal or a nonterminal.
Method is_lexical Return True if all productions are lexicalised.
Method is_nonempty Return True if there are no empty productions.
Method is_nonlexical Return True if all lexical rules are "preterminals", that is, unary rules which can be separated in a preprocessing step.
Method leftcorner_parents Return the set of all nonterminals for which the given category is a left corner. This is the inverse of the leftcorner relation.
Method leftcorners Return the set of all nonterminals that the given nonterminal can start with, including itself.
Method max_len Return the right-hand side length of the longest grammar production.
Method min_len Return the right-hand side length of the shortest grammar production.
Method productions Return the grammar productions, filtered by the left-hand side or the first item in the right-hand side.
Method start Return the start symbol of the grammar
Method _calculate_grammar_forms Pre-calculate of which form(s) the grammar is.
Method _calculate_indexes Undocumented
Method _calculate_leftcorners Undocumented
Instance Variable _all_unary_are_lexical Undocumented
Instance Variable _categories Undocumented
Instance Variable _empty_index Undocumented
Instance Variable _immediate_leftcorner_categories Undocumented
Instance Variable _immediate_leftcorner_words Undocumented
Instance Variable _is_lexical Undocumented
Instance Variable _is_nonlexical Undocumented
Instance Variable _leftcorner_parents Undocumented
Instance Variable _leftcorner_words Undocumented
Instance Variable _leftcorners Undocumented
Instance Variable _lexical_index Undocumented
Instance Variable _lhs_index Undocumented
Instance Variable _max_len Undocumented
Instance Variable _min_len Undocumented
Instance Variable _productions Undocumented
Instance Variable _rhs_index Undocumented
Instance Variable _start Undocumented
@classmethod
def fromstring(cls, input, encoding=None): (source)

Return a probabilistic context-free grammar corresponding to the input string(s).

Parameters
inputa grammar, either in the form of a string or else as a list of strings.
encodingUndocumented
def __init__(self, start, productions, calculate_leftcorners=True): (source)

Create a new context-free grammar, from the given start state and set of ProbabilisticProductions.

Parameters
start:NonterminalThe start symbol
productions:list(Production)The list of productions that defines the grammar
calculate_leftcorners:boolFalse if we don't want to calculate the leftcorner relation. In that case, some optimized chart parsers won't work.
Raises
ValueErrorif the set of productions with any left-hand-side do not have probabilities that sum to a value within EPSILON of 1.
EPSILON: float = (source)

The acceptable margin of error for checking that productions with a given left-hand side have probabilities that sum to 1.

Value
0.01