module documentation
This is the docutils.parsers.rst.states module, the core of the reStructuredText parser. It defines the following:
Parser Overview
The reStructuredText parser is implemented as a recursive state machine,
examining its input one line at a time. To understand how the parser works,
please first become familiar with the docutils.statemachine
module. In the
description below, references are made to classes defined in this module;
please see the individual classes for details.
Parsing proceeds as follows:
- The state machine examines each line of input, checking each of the
transition patterns of the state
Body
, in order, looking for a match. The implicit transitions (blank lines and indentation) are checked before any others. The 'text' transition is a catch-all (matches anything). - The method associated with the matched transition pattern is called.
- Some transition methods are self-contained, appending elements to the
document tree (
Body.doctest
parses a doctest block). The parser's current line index is advanced to the end of the element, and parsing continues with step 1. - Other transition methods trigger the creation of a nested state machine,
whose job is to parse a compound construct ('indent' does a block quote,
'bullet' does a bullet list, 'overline' does a section [first checking
for a valid section header], etc.).
- In the case of lists and explicit markup, a one-off state machine is created and run to parse contents of the first item.
- A new state machine is created and its initial state is set to the
appropriate specialized state (
BulletList
in the case of the 'bullet' transition; seeSpecializedBody
for more detail). This state machine is run to parse the compound element (or series of explicit markup elements), and returns as soon as a non-member element is encountered. For example, theBulletList
state machine ends as soon as it encounters an element which is not a list item of that bullet list. The optional omission of inter-element blank lines is enabled by this nested state machine. - The current line index is advanced to the end of the elements parsed, and parsing continues with step 1.
- The result of the 'text' transition depends on the next line of text.
The current state is changed to
Text
, under which the second line is examined. If the second line is:- Indented: The element is a definition list item, and parsing proceeds
similarly to step 2.B, using the
DefinitionList
state. - A line of uniform punctuation characters: The element is a section
header; again, parsing proceeds as in step 2.B, and
Body
is still used. - Anything else: The element is a paragraph, which is examined for inline markup and appended to the parent element. Processing continues with step 1.
- Indented: The element is a definition list item, and parsing proceeds
similarly to step 2.B, using the
- Some transition methods are self-contained, appending elements to the
document tree (
Unknown Field: classes | |
| |
Unknown Field: exception | |
classes | |
Unknown Field: functions | |
| |
Unknown Field: attributes | |
|
Class |
|
Generic classifier of the first line of a block. |
Class |
|
Second and subsequent bullet_list list_items. |
Class |
|
Second line of potential definition_list_item. |
Class |
|
Second and subsequent definition_list_items. |
Class |
|
Second and subsequent enumerated_list list_items. |
Class |
|
Second and subsequent explicit markup construct. |
Class |
|
Parse field_list fields for extension options. |
Class |
|
Second and subsequent field_list fields. |
Class |
|
Parse inline markup; call the parse() method. |
Class |
|
Second line of over- & underlined section title or transition marker. |
Class |
|
Second and subsequent lines of a line_block. |
Class |
|
StateMachine run from within other StateMachine runs, to parse nested document structures. |
Class |
|
Second and subsequent option_list option_list_items. |
Class |
|
Nested parse handler for quoted (unindented) literal blocks. |
Class |
|
RFC2822 headers are only valid as the first constructs in documents. As soon as anything else appears, the Body state should take over. |
Class |
|
Second and subsequent RFC2822-style field_list fields. |
Class |
|
reStructuredText State superclass. |
Class |
|
reStructuredText's master StateMachine. |
Class |
|
Superclass for second and subsequent compound element members. Compound elements are lists and list-like constructs. |
Class |
|
Superclass for second and subsequent lines of Text-variants. |
Class |
|
Stores data attributes for dotted-attribute access. |
Class |
|
Parser for the contents of a substitution_definition element. |
Class |
|
Classifier of second line of a text block. |
Exception |
|
Undocumented |
Exception |
|
Undocumented |
Exception |
|
Undocumented |
Exception |
|
Undocumented |
Exception |
|
Undocumented |
Function | build |
Build, compile and return a regular expression based on definition . |
Variable | state |
Standard set of State classes used to start RSTStateMachine . |
Function | _loweralpha |
Undocumented |
Function | _lowerroman |
Undocumented |
Function | _upperalpha |
Undocumented |
Build, compile and return a regular expression based on definition
.
Unknown Field: parameter | |
definition : a 4-tuple (group name, prefix, suffix, parts),
where "parts" is a list of regular expressions and/or regular
expression definitions to be joined into an or-group. |