module documentation
        
        
        Utility functions for parsers.
| Class |  | Unit tests for CFG. | 
| Function | extract | Parses a string with one test sentence per line. Lines can optionally begin with: | 
| Function | load | Load a grammar from a file, and build a parser based on that grammar. The parser depends on the grammar format, and might also depend on properties of the grammar itself. | 
| Function | taggedsent | A module to convert a single POS tagged sentence into CONLL format. | 
| Function | taggedsents | A module to convert the a POS tagged document stream (i.e. list of list of tuples, a list of sentences) and yield lines in CONLL format. This module yields one line per word and two newlines for end of sentence. | 
Parses a string with one test sentence per line. Lines can optionally begin with:
- a bool, saying if the sentence is grammatical or not, or
- an int, giving the number of parse trees is should have,
The result information is followed by a colon, and then the sentence. Empty lines and lines beginning with a comment char are ignored.
| Parameters | |
| string | Undocumented | 
| comment | str of possible comment characters. | 
| encoding | the encoding of the string, if it is binary | 
| Returns | |
| a list of tuple of sentences and expected results, where a sentence is a list of str, and a result is None, or bool, or int | |
    
    
    def load_parser(grammar_url, trace=0, parser=None, chart_class=None, beam_size=0, **load_args):
    
      
      (source)
    
    
      
      
      ¶
    
  
  Load a grammar from a file, and build a parser based on that grammar. The parser depends on the grammar format, and might also depend on properties of the grammar itself.
- The following grammar formats are currently supported:
- 'cfg' (CFGs: CFG)
- 'pcfg' (probabilistic CFGs: PCFG)
- 'fcfg' (feature-based CFGs: FeatureGrammar)
 
| Parameters | |
| grammar | A URL specifying where the grammar is located. The default protocol is "nltk:", which searches for the file in the the NLTK data package. | 
| trace:int | The level of tracing that should be used when parsing a text. 0 will generate no tracing output; and higher numbers will produce more verbose tracing output. | 
| parser | The class used for parsing; should be ChartParser or a subclass. If None, the class depends on the grammar format. | 
| chart | The class used for storing the chart; should be Chart or a subclass. Only used for CFGs and feature CFGs. If None, the chart class depends on the grammar format. | 
| beam | The maximum length for the parser's edge queue. Only used for probabilistic CFGs. | 
| **load | Keyword parameters used when loading the grammar. See data.load for more information. | 
A module to convert a single POS tagged sentence into CONLL format.
>>> from nltk import word_tokenize, pos_tag >>> text = "This is a foobar sentence." >>> for line in taggedsent_to_conll(pos_tag(word_tokenize(text))): ... print(line, end="") 1 This _ DT DT _ 0 a _ _ 2 is _ VBZ VBZ _ 0 a _ _ 3 a _ DT DT _ 0 a _ _ 4 foobar _ JJ JJ _ 0 a _ _ 5 sentence _ NN NN _ 0 a _ _ 6 . _ . . _ 0 a _ _
| Parameters | |
| sentence:list(tuple(str, str)) | A single input sentence to parse | 
| Returns | |
| iter(str) | a generator yielding a single sentence in CONLL format. | 
A module to convert the a POS tagged document stream (i.e. list of list of tuples, a list of sentences) and yield lines in CONLL format. This module yields one line per word and two newlines for end of sentence.
>>> from nltk import word_tokenize, sent_tokenize, pos_tag >>> text = "This is a foobar sentence. Is that right?" >>> sentences = [pos_tag(word_tokenize(sent)) for sent in sent_tokenize(text)] >>> for line in taggedsents_to_conll(sentences): ... if line: ... print(line, end="") 1 This _ DT DT _ 0 a _ _ 2 is _ VBZ VBZ _ 0 a _ _ 3 a _ DT DT _ 0 a _ _ 4 foobar _ JJ JJ _ 0 a _ _ 5 sentence _ NN NN _ 0 a _ _ 6 . _ . . _ 0 a _ _ <BLANKLINE> <BLANKLINE> 1 Is _ VBZ VBZ _ 0 a _ _ 2 that _ IN IN _ 0 a _ _ 3 right _ NN NN _ 0 a _ _ 4 ? _ . . _ 0 a _ _ <BLANKLINE> <BLANKLINE>
| Parameters | |
| sentences | Input sentences to parse | 
| sentence:list(list(tuple(str, str))) | Undocumented | 
| Returns | |
| iter(str) | a generator yielding sentences in CONLL format. |