nltk.parse.corenlp.CoreNLPDependencyParser

class documentation

class CoreNLPDependencyParser(GenericCoreNLPParser): (source)

Constructor: CoreNLPDependencyParser(url, encoding, tagtype)

Dependency parser.

>>> dep_parser = CoreNLPDependencyParser(url='http://localhost:9000')

>>> parse, = dep_parser.raw_parse(
...     'The quick brown fox jumps over the lazy dog.'
... )
>>> print(parse.to_conll(4))  # doctest: +NORMALIZE_WHITESPACE
The     DT      4       det
quick   JJ      4       amod
brown   JJ      4       amod
fox     NN      5       nsubj
jumps   VBZ     0       ROOT
over    IN      9       case
the     DT      9       det
lazy    JJ      9       amod
dog     NN      5       nmod
.       .       5       punct

>>> print(parse.tree())  # doctest: +NORMALIZE_WHITESPACE
(jumps (fox The quick brown) (dog over the lazy) .)

>>> for governor, dep, dependent in parse.triples():
...     print(governor, dep, dependent)  # doctest: +NORMALIZE_WHITESPACE
    ('jumps', 'VBZ') nsubj ('fox', 'NN')
    ('fox', 'NN') det ('The', 'DT')
    ('fox', 'NN') amod ('quick', 'JJ')
    ('fox', 'NN') amod ('brown', 'JJ')
    ('jumps', 'VBZ') nmod ('dog', 'NN')
    ('dog', 'NN') case ('over', 'IN')
    ('dog', 'NN') det ('the', 'DT')
    ('dog', 'NN') amod ('lazy', 'JJ')
    ('jumps', 'VBZ') punct ('.', '.')

>>> (parse_fox, ), (parse_dog, ) = dep_parser.raw_parse_sents(
...     [
...         'The quick brown fox jumps over the lazy dog.',
...         'The quick grey wolf jumps over the lazy fox.',
...     ]
... )
>>> print(parse_fox.to_conll(4))  # doctest: +NORMALIZE_WHITESPACE
The DT      4       det
quick       JJ      4       amod
brown       JJ      4       amod
fox NN      5       nsubj
jumps       VBZ     0       ROOT
over        IN      9       case
the DT      9       det
lazy        JJ      9       amod
dog NN      5       nmod
.   .       5       punct

>>> print(parse_dog.to_conll(4))  # doctest: +NORMALIZE_WHITESPACE
The DT      4       det
quick       JJ      4       amod
grey        JJ      4       amod
wolf        NN      5       nsubj
jumps       VBZ     0       ROOT
over        IN      9       case
the DT      9       det
lazy        JJ      9       amod
fox NN      5       nmod
.   .       5       punct

>>> (parse_dog, ), (parse_friends, ) = dep_parser.parse_sents(
...     [
...         "I 'm a dog".split(),
...         "This is my friends ' cat ( the tabby )".split(),
...     ]
... )
>>> print(parse_dog.to_conll(4))  # doctest: +NORMALIZE_WHITESPACE
I   PRP     4       nsubj
'm  VBP     4       cop
a   DT      4       det
dog NN      0       ROOT

>>> print(parse_friends.to_conll(4))  # doctest: +NORMALIZE_WHITESPACE
This        DT      6       nsubj
is  VBZ     6       cop
my  PRP$    4       nmod:poss
friends     NNS     6       nmod:poss
'   POS     4       case
cat NN      0       ROOT
-LRB-       -LRB-   9       punct
the DT      9       det
tabby       NN      6       appos
-RRB-       -RRB-   9       punct

>>> parse_john, parse_mary, = dep_parser.parse_text(
...     'John loves Mary. Mary walks.'
... )

>>> print(parse_john.to_conll(4))  # doctest: +NORMALIZE_WHITESPACE
John        NNP     2       nsubj
loves       VBZ     0       ROOT
Mary        NNP     2       dobj
.   .       2       punct

>>> print(parse_mary.to_conll(4))  # doctest: +NORMALIZE_WHITESPACE
Mary        NNP     2       nsubj
walks       VBZ     0       ROOT
.   .       2       punct

Special cases

Non-breaking space inside of a token.

>>> len(
...     next(
...         dep_parser.raw_parse(
...             'Anhalt said children typically treat a 20-ounce soda bottle as one '
...             'serving, while it actually contains 2 1/2 servings.'
...         )
...     ).nodes
... )
21

Phone numbers.

>>> len(
...     next(
...         dep_parser.raw_parse('This is not going to crash: 01 111 555.')
...     ).nodes
... )
10

>>> print(
...     next(
...         dep_parser.raw_parse('The underscore _ should not simply disappear.')
...     ).to_conll(4)
... )  # doctest: +NORMALIZE_WHITESPACE
The         DT  3   det
underscore  VBP 3   amod
_           NN  7   nsubj
should      MD  7   aux
not         RB  7   neg
simply      RB  7   advmod
disappear   VB  0   ROOT
.           .   7   punct

>>> print(
...     '\n'.join(
...         next(
...             dep_parser.raw_parse(
...                 'for all of its insights into the dream world of teen life , and its electronic expression through '
...                 'cyber culture , the film gives no quarter to anyone seeking to pull a cohesive story out of its 2 '
...                 '1/2-hour running time .'
...             )
...         ).to_conll(4).split('\n')[-8:]
...     )
... )
its PRP$    40      nmod:poss
2 1/2       CD      40      nummod
-   :       40      punct
hour        NN      31      nmod
running     VBG     42      amod
time        NN      40      dep
.   .       24      punct
<BLANKLINE>

Method	`make_tree`	Undocumented
Class Variable	`parser_annotator`	Undocumented
Constant	`_OUTPUT_FORMAT`	Undocumented

Inherited from GenericCoreNLPParser:

Method	`__init__`	Undocumented
Method	`api_call`	Undocumented
Method	`parse_sents`	Parse multiple sentences.
Method	`parse_text`	Parse a piece of text.
Method	`raw_parse`	Parse a sentence.
Method	`raw_parse_sents`	Parse multiple sentences.
Method	`raw_tag_sents`	Tag multiple sentences.
Method	`tag`	Tag a list of tokens.
Method	`tag_sents`	Tag multiple sentences.
Method	`tokenize`	Tokenize a string of text.
Instance Variable	`encoding`	Undocumented
Instance Variable	`session`	Undocumented
Instance Variable	`tagtype`	Undocumented
Instance Variable	`url`	Undocumented

Inherited from ParserI (via GenericCoreNLPParser):

Method	`grammar`	No summary
Method	`parse`	When possible this list is sorted from most likely to least likely.
Method	`parse_all`	No summary
Method	`parse_one`	No summary

Inherited from TokenizerI (via GenericCoreNLPParser, ParserI):

Method	`span_tokenize`	Identify the tokens using integer offsets `(start_i, end_i)`, where `s[start_i:end_i]` is the corresponding token.
Method	`span_tokenize_sents`	Apply `self.span_tokenize()` to each element of `strings`. I.e.:
Method	`tokenize_sents`	Apply `self.tokenize()` to each element of `strings`. I.e.:

Inherited from TaggerI (via GenericCoreNLPParser, ParserI, TokenizerI):

Method	`evaluate`	Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score.
Method	`_check_params`	Undocumented

def make_tree(self, result): (source) ¶

Undocumented

parser_annotator: str = (source) ¶

Undocumented

_OUTPUT_FORMAT: str = (source) ¶

Undocumented

Value

'conll2007'