nltk.parse.corenlp.GenericCoreNLPParser

class documentation

class GenericCoreNLPParser(ParserI, TokenizerI, TaggerI): (source)

Known subclasses: nltk.parse.corenlp.CoreNLPDependencyParser, nltk.parse.corenlp.CoreNLPParser

Constructor: GenericCoreNLPParser(url, encoding, tagtype)

Interface to the CoreNLP Parser.

Method	`__init__`	Undocumented
Method	`api_call`	Undocumented
Method	`parse_sents`	Parse multiple sentences.
Method	`parse_text`	Parse a piece of text.
Method	`raw_parse`	Parse a sentence.
Method	`raw_parse_sents`	Parse multiple sentences.
Method	`raw_tag_sents`	Tag multiple sentences.
Method	`tag`	Tag a list of tokens.
Method	`tag_sents`	Tag multiple sentences.
Method	`tokenize`	Tokenize a string of text.
Instance Variable	`encoding`	Undocumented
Instance Variable	`session`	Undocumented
Instance Variable	`tagtype`	Undocumented
Instance Variable	`url`	Undocumented

Inherited from ParserI:

Method	`grammar`	No summary
Method	`parse`	When possible this list is sorted from most likely to least likely.
Method	`parse_all`	No summary
Method	`parse_one`	No summary

Inherited from TokenizerI (via ParserI):

Method	`span_tokenize`	Identify the tokens using integer offsets `(start_i, end_i)`, where `s[start_i:end_i]` is the corresponding token.
Method	`span_tokenize_sents`	Apply `self.span_tokenize()` to each element of `strings`. I.e.:
Method	`tokenize_sents`	Apply `self.tokenize()` to each element of `strings`. I.e.:

Inherited from TaggerI (via ParserI, TokenizerI):

Method	`evaluate`	Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score.
Method	`_check_params`	Undocumented

def __init__(self, url='http://localhost:9000', encoding='utf8', tagtype=None): (source) ¶

Undocumented

def api_call(self, data, properties=None, timeout=60): (source) ¶

Undocumented

def parse_sents(self, sentences, *args, **kwargs): (source) ¶

overrides nltk.parse.api.ParserI.parse_sents

Parse multiple sentences.

Takes multiple sentences as a list where each sentence is a list of words. Each sentence will be automatically tagged with this CoreNLPParser instance's tagger.

If a whitespace exists inside a token, then the token will be treated as several tokens.

Parameters
sentences:list(list(str))	Input sentences to parse
*args	Undocumented
**kwargs	Undocumented
Returns
iter(iter(Tree))	Undocumented

def parse_text(self, text, *args, **kwargs): (source) ¶

Parse a piece of text.

The text might contain several sentences which will be split by CoreNLP.

Parameters
text	Undocumented
*args	Undocumented
str text	text to be split.
**kwargs	Undocumented
Returns
an iterable of syntactic structures. # TODO: should it be an iterable of iterables?

def raw_parse(self, sentence, properties=None, *args, **kwargs): (source) ¶

Parse a sentence.

Takes a sentence as a string; before parsing, it will be automatically tokenized and tagged by the CoreNLP Parser.

Parameters
sentence:str	Input sentence to parse
properties	Undocumented
*args	Undocumented
**kwargs	Undocumented
Returns
iter(Tree)	Undocumented

def raw_parse_sents(self, sentences, verbose=False, properties=None, *args, **kwargs): (source) ¶

Parse multiple sentences.

Takes multiple sentences as a list of strings. Each sentence will be automatically tokenized and tagged.

Parameters
sentences:list(str)	Input sentences to parse.
verbose	Undocumented
properties	Undocumented
*args	Undocumented
**kwargs	Undocumented
Returns
iter(iter(Tree))	Undocumented

def raw_tag_sents(self, sentences): (source) ¶

Tag multiple sentences.

Takes multiple sentences as a list where each sentence is a string.

Parameters
sentences:list(str)	Input sentences to tag
Returns
list(list(list(tuple(str, str)))	Undocumented

def tag(self, sentence): (source) ¶

overrides nltk.tag.api.TaggerI.tag

Tag a list of tokens.

>>> parser = CoreNLPParser(url='http://localhost:9000', tagtype='ner')
>>> tokens = 'Rami Eid is studying at Stony Brook University in NY'.split()
>>> parser.tag(tokens)
[('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'), ('at', 'O'), ('Stony', 'ORGANIZATION'),
('Brook', 'ORGANIZATION'), ('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'O')]

>>> parser = CoreNLPParser(url='http://localhost:9000', tagtype='pos')
>>> tokens = "What is the airspeed of an unladen swallow ?".split()
>>> parser.tag(tokens)
[('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'),
('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'),
('unladen', 'JJ'), ('swallow', 'VB'), ('?', '.')]

Returns
list(tuple(str, str))	Undocumented

def tag_sents(self, sentences): (source) ¶

overrides nltk.tag.api.TaggerI.tag_sents

Tag multiple sentences.

Takes multiple sentences as a list where each sentence is a list of tokens.

Parameters
sentences:list(list(str))	Input sentences to tag
Returns
list(list(tuple(str, str))	Undocumented

def tokenize(self, text, properties=None): (source) ¶

overrides nltk.tokenize.api.TokenizerI.tokenize

Tokenize a string of text.

>>> parser = CoreNLPParser(url='http://localhost:9000')

>>> text = 'Good muffins cost $3.88\nin New York.  Please buy me\ntwo of them.\nThanks.'
>>> list(parser.tokenize(text))
['Good', 'muffins', 'cost', '$', '3.88', 'in', 'New', 'York', '.', 'Please', 'buy', 'me', 'two', 'of', 'them', '.', 'Thanks', '.']

>>> s = "The colour of the wall is blue."
>>> list(
...     parser.tokenize(
...         'The colour of the wall is blue.',
...             properties={'tokenize.options': 'americanize=true'},
...     )
... )
['The', 'color', 'of', 'the', 'wall', 'is', 'blue', '.']

encoding = (source) ¶

Undocumented

session = (source) ¶

Undocumented

tagtype = (source) ¶

Undocumented

url = (source) ¶

Undocumented