module documentation
(source)

This is the docutils.parsers.rst.states module, the core of the reStructuredText parser. It defines the following:

Parser Overview

The reStructuredText parser is implemented as a recursive state machine, examining its input one line at a time. To understand how the parser works, please first become familiar with the docutils.statemachine module. In the description below, references are made to classes defined in this module; please see the individual classes for details.

Parsing proceeds as follows:

  1. The state machine examines each line of input, checking each of the transition patterns of the state Body, in order, looking for a match. The implicit transitions (blank lines and indentation) are checked before any others. The 'text' transition is a catch-all (matches anything).
  2. The method associated with the matched transition pattern is called.
    1. Some transition methods are self-contained, appending elements to the document tree (Body.doctest parses a doctest block). The parser's current line index is advanced to the end of the element, and parsing continues with step 1.
    2. Other transition methods trigger the creation of a nested state machine, whose job is to parse a compound construct ('indent' does a block quote, 'bullet' does a bullet list, 'overline' does a section [first checking for a valid section header], etc.).
      • In the case of lists and explicit markup, a one-off state machine is created and run to parse contents of the first item.
      • A new state machine is created and its initial state is set to the appropriate specialized state (BulletList in the case of the 'bullet' transition; see SpecializedBody for more detail). This state machine is run to parse the compound element (or series of explicit markup elements), and returns as soon as a non-member element is encountered. For example, the BulletList state machine ends as soon as it encounters an element which is not a list item of that bullet list. The optional omission of inter-element blank lines is enabled by this nested state machine.
      • The current line index is advanced to the end of the elements parsed, and parsing continues with step 1.
    3. The result of the 'text' transition depends on the next line of text. The current state is changed to Text, under which the second line is examined. If the second line is:
      • Indented: The element is a definition list item, and parsing proceeds similarly to step 2.B, using the DefinitionList state.
      • A line of uniform punctuation characters: The element is a section header; again, parsing proceeds as in step 2.B, and Body is still used.
      • Anything else: The element is a paragraph, which is examined for inline markup and appended to the parent element. Processing continues with step 1.
Unknown Field: classes
Unknown Field: exception
classes
Unknown Field: functions
  • escape2null(): Return a string, escape-backslashes converted to nulls.
  • unescape(): Return a string, nulls removed or restored to backslashes.
Unknown Field: attributes
Class ​Body Generic classifier of the first line of a block.
Class ​Bullet​List Second and subsequent bullet_list list_items.
Class ​Definition Second line of potential definition_list_item.
Class ​Definition​List Second and subsequent definition_list_items.
Class ​Enumerated​List Second and subsequent enumerated_list list_items.
Class ​Explicit Second and subsequent explicit markup construct.
Class ​Extension​Options Parse field_list fields for extension options.
Class ​Field​List Second and subsequent field_list fields.
Class ​Inliner Parse inline markup; call the parse() method.
Class ​Interpreted​Role​Not​Implemented​Error Undocumented
Class ​Line Second line of over- & underlined section title or transition marker.
Class ​Line​Block Second and subsequent lines of a line_block.
Class ​Markup​Error Undocumented
Class ​Markup​Mismatch Undocumented
Class ​Nested​State​Machine StateMachine run from within other StateMachine runs, to parse nested document structures.
Class ​Option​List Second and subsequent option_list option_list_items.
Class ​Parser​Error Undocumented
Class ​Quoted​Literal​Block Nested parse handler for quoted (unindented) literal blocks.
Class ​RFC2822​Body RFC2822 headers are only valid as the first constructs in documents. As soon as anything else appears, the Body state should take over.
Class ​RFC2822​List Second and subsequent RFC2822-style field_list fields.
Class ​RSTState reStructuredText State superclass.
Class ​RSTState​Machine reStructuredText's master StateMachine.
Class ​Specialized​Body Superclass for second and subsequent compound element members. Compound elements are lists and list-like constructs.
Class ​Specialized​Text Superclass for second and subsequent lines of Text-variants.
Class ​Struct Stores data attributes for dotted-attribute access.
Class ​Substitution​Def Parser for the contents of a substitution_definition element.
Class ​Text Classifier of second line of a text block.
Class ​Unknown​Interpreted​Role​Error Undocumented
Function build​_regexp Build, compile and return a regular expression based on definition.
Variable state​_classes Standard set of State classes used to start RSTStateMachine.
Function _loweralpha​_to​_int Undocumented
Function _lowerroman​_to​_int Undocumented
Function _upperalpha​_to​_int Undocumented
def build_regexp(definition, compile=True): (source)
Build, compile and return a regular expression based on definition.
Unknown Field: parameter
definition: a 4-tuple (group name, prefix, suffix, parts), where "parts" is a list of regular expressions and/or regular expression definitions to be joined into an or-group.
state_classes = (source)
Standard set of State classes used to start RSTStateMachine.
def _loweralpha_to_int(s, _zero=ord('a')-1): (source)

Undocumented

def _lowerroman_to_int(s): (source)

Undocumented

def _upperalpha_to_int(s, _zero=ord('A')-1): (source)

Undocumented