class documentation

Dependency parser.

>>> dep_parser = CoreNLPDependencyParser(url='http://localhost:9000')
>>> parse, = dep_parser.raw_parse(
...     'The quick brown fox jumps over the lazy dog.'
... )
>>> print(parse.to_conll(4))  # doctest: +NORMALIZE_WHITESPACE
The     DT      4       det
quick   JJ      4       amod
brown   JJ      4       amod
fox     NN      5       nsubj
jumps   VBZ     0       ROOT
over    IN      9       case
the     DT      9       det
lazy    JJ      9       amod
dog     NN      5       nmod
.       .       5       punct
>>> print(parse.tree())  # doctest: +NORMALIZE_WHITESPACE
(jumps (fox The quick brown) (dog over the lazy) .)
>>> for governor, dep, dependent in parse.triples():
...     print(governor, dep, dependent)  # doctest: +NORMALIZE_WHITESPACE
    ('jumps', 'VBZ') nsubj ('fox', 'NN')
    ('fox', 'NN') det ('The', 'DT')
    ('fox', 'NN') amod ('quick', 'JJ')
    ('fox', 'NN') amod ('brown', 'JJ')
    ('jumps', 'VBZ') nmod ('dog', 'NN')
    ('dog', 'NN') case ('over', 'IN')
    ('dog', 'NN') det ('the', 'DT')
    ('dog', 'NN') amod ('lazy', 'JJ')
    ('jumps', 'VBZ') punct ('.', '.')
>>> (parse_fox, ), (parse_dog, ) = dep_parser.raw_parse_sents(
...     [
...         'The quick brown fox jumps over the lazy dog.',
...         'The quick grey wolf jumps over the lazy fox.',
...     ]
... )
>>> print(parse_fox.to_conll(4))  # doctest: +NORMALIZE_WHITESPACE
The DT      4       det
quick       JJ      4       amod
brown       JJ      4       amod
fox NN      5       nsubj
jumps       VBZ     0       ROOT
over        IN      9       case
the DT      9       det
lazy        JJ      9       amod
dog NN      5       nmod
.   .       5       punct
>>> print(parse_dog.to_conll(4))  # doctest: +NORMALIZE_WHITESPACE
The DT      4       det
quick       JJ      4       amod
grey        JJ      4       amod
wolf        NN      5       nsubj
jumps       VBZ     0       ROOT
over        IN      9       case
the DT      9       det
lazy        JJ      9       amod
fox NN      5       nmod
.   .       5       punct
>>> (parse_dog, ), (parse_friends, ) = dep_parser.parse_sents(
...     [
...         "I 'm a dog".split(),
...         "This is my friends ' cat ( the tabby )".split(),
...     ]
... )
>>> print(parse_dog.to_conll(4))  # doctest: +NORMALIZE_WHITESPACE
I   PRP     4       nsubj
'm  VBP     4       cop
a   DT      4       det
dog NN      0       ROOT
>>> print(parse_friends.to_conll(4))  # doctest: +NORMALIZE_WHITESPACE
This        DT      6       nsubj
is  VBZ     6       cop
my  PRP$    4       nmod:poss
friends     NNS     6       nmod:poss
'   POS     4       case
cat NN      0       ROOT
-LRB-       -LRB-   9       punct
the DT      9       det
tabby       NN      6       appos
-RRB-       -RRB-   9       punct
>>> parse_john, parse_mary, = dep_parser.parse_text(
...     'John loves Mary. Mary walks.'
... )
>>> print(parse_john.to_conll(4))  # doctest: +NORMALIZE_WHITESPACE
John        NNP     2       nsubj
loves       VBZ     0       ROOT
Mary        NNP     2       dobj
.   .       2       punct
>>> print(parse_mary.to_conll(4))  # doctest: +NORMALIZE_WHITESPACE
Mary        NNP     2       nsubj
walks       VBZ     0       ROOT
.   .       2       punct

Special cases

Non-breaking space inside of a token.

>>> len(
...     next(
...         dep_parser.raw_parse(
...             'Anhalt said children typically treat a 20-ounce soda bottle as one '
...             'serving, while it actually contains 2 1/2 servings.'
...         )
...     ).nodes
... )
21

Phone numbers.

>>> len(
...     next(
...         dep_parser.raw_parse('This is not going to crash: 01 111 555.')
...     ).nodes
... )
10
>>> print(
...     next(
...         dep_parser.raw_parse('The underscore _ should not simply disappear.')
...     ).to_conll(4)
... )  # doctest: +NORMALIZE_WHITESPACE
The         DT  3   det
underscore  VBP 3   amod
_           NN  7   nsubj
should      MD  7   aux
not         RB  7   neg
simply      RB  7   advmod
disappear   VB  0   ROOT
.           .   7   punct
>>> print(
...     '\n'.join(
...         next(
...             dep_parser.raw_parse(
...                 'for all of its insights into the dream world of teen life , and its electronic expression through '
...                 'cyber culture , the film gives no quarter to anyone seeking to pull a cohesive story out of its 2 '
...                 '1/2-hour running time .'
...             )
...         ).to_conll(4).split('\n')[-8:]
...     )
... )
its PRP$    40      nmod:poss
2 1/2       CD      40      nummod
-   :       40      punct
hour        NN      31      nmod
running     VBG     42      amod
time        NN      40      dep
.   .       24      punct
<BLANKLINE>
Method make_tree Undocumented
Class Variable parser_annotator Undocumented
Constant _OUTPUT_FORMAT Undocumented

Inherited from GenericCoreNLPParser:

Method __init__ Undocumented
Method api_call Undocumented
Method parse_sents Parse multiple sentences.
Method parse_text Parse a piece of text.
Method raw_parse Parse a sentence.
Method raw_parse_sents Parse multiple sentences.
Method raw_tag_sents Tag multiple sentences.
Method tag Tag a list of tokens.
Method tag_sents Tag multiple sentences.
Method tokenize Tokenize a string of text.
Instance Variable encoding Undocumented
Instance Variable session Undocumented
Instance Variable tagtype Undocumented
Instance Variable url Undocumented

Inherited from ParserI (via GenericCoreNLPParser):

Method grammar No summary
Method parse When possible this list is sorted from most likely to least likely.
Method parse_all No summary
Method parse_one No summary

Inherited from TokenizerI (via GenericCoreNLPParser, ParserI):

Method span_tokenize Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token.
Method span_tokenize_sents Apply self.span_tokenize() to each element of strings. I.e.:
Method tokenize_sents Apply self.tokenize() to each element of strings. I.e.:

Inherited from TaggerI (via GenericCoreNLPParser, ParserI, TokenizerI):

Method evaluate Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score.
Method _check_params Undocumented
def make_tree(self, result): (source)

Undocumented

parser_annotator: str = (source)

Undocumented

_OUTPUT_FORMAT: str = (source)

Undocumented

Value
'conll2007'