class documentation

Interface to the CoreNLP Parser.

Method __init__ Undocumented
Method api_call Undocumented
Method parse_sents Parse multiple sentences.
Method parse_text Parse a piece of text.
Method raw_parse Parse a sentence.
Method raw_parse_sents Parse multiple sentences.
Method raw_tag_sents Tag multiple sentences.
Method tag Tag a list of tokens.
Method tag_sents Tag multiple sentences.
Method tokenize Tokenize a string of text.
Instance Variable encoding Undocumented
Instance Variable session Undocumented
Instance Variable tagtype Undocumented
Instance Variable url Undocumented

Inherited from ParserI:

Method grammar No summary
Method parse When possible this list is sorted from most likely to least likely.
Method parse_all No summary
Method parse_one No summary

Inherited from TokenizerI (via ParserI):

Method span_tokenize Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token.
Method span_tokenize_sents Apply self.span_tokenize() to each element of strings. I.e.:
Method tokenize_sents Apply self.tokenize() to each element of strings. I.e.:

Inherited from TaggerI (via ParserI, TokenizerI):

Method evaluate Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score.
Method _check_params Undocumented
def __init__(self, url='http://localhost:9000', encoding='utf8', tagtype=None): (source)

Undocumented

def api_call(self, data, properties=None, timeout=60): (source)

Undocumented

def parse_sents(self, sentences, *args, **kwargs): (source)

Parse multiple sentences.

Takes multiple sentences as a list where each sentence is a list of words. Each sentence will be automatically tagged with this CoreNLPParser instance's tagger.

If a whitespace exists inside a token, then the token will be treated as several tokens.

Parameters
sentences:list(list(str))Input sentences to parse
*argsUndocumented
**kwargsUndocumented
Returns
iter(iter(Tree))Undocumented
def parse_text(self, text, *args, **kwargs): (source)

Parse a piece of text.

The text might contain several sentences which will be split by CoreNLP.

Parameters
textUndocumented
*argsUndocumented
str texttext to be split.
**kwargsUndocumented
Returns
an iterable of syntactic structures. # TODO: should it be an iterable of iterables?
def raw_parse(self, sentence, properties=None, *args, **kwargs): (source)

Parse a sentence.

Takes a sentence as a string; before parsing, it will be automatically tokenized and tagged by the CoreNLP Parser.

Parameters
sentence:strInput sentence to parse
propertiesUndocumented
*argsUndocumented
**kwargsUndocumented
Returns
iter(Tree)Undocumented
def raw_parse_sents(self, sentences, verbose=False, properties=None, *args, **kwargs): (source)

Parse multiple sentences.

Takes multiple sentences as a list of strings. Each sentence will be automatically tokenized and tagged.

Parameters
sentences:list(str)Input sentences to parse.
verboseUndocumented
propertiesUndocumented
*argsUndocumented
**kwargsUndocumented
Returns
iter(iter(Tree))Undocumented
def raw_tag_sents(self, sentences): (source)

Tag multiple sentences.

Takes multiple sentences as a list where each sentence is a string.

Parameters
sentences:list(str)Input sentences to tag
Returns
list(list(list(tuple(str, str)))Undocumented
def tag(self, sentence): (source)

Tag a list of tokens.

>>> parser = CoreNLPParser(url='http://localhost:9000', tagtype='ner')
>>> tokens = 'Rami Eid is studying at Stony Brook University in NY'.split()
>>> parser.tag(tokens)
[('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'), ('at', 'O'), ('Stony', 'ORGANIZATION'),
('Brook', 'ORGANIZATION'), ('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'O')]
>>> parser = CoreNLPParser(url='http://localhost:9000', tagtype='pos')
>>> tokens = "What is the airspeed of an unladen swallow ?".split()
>>> parser.tag(tokens)
[('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'),
('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'),
('unladen', 'JJ'), ('swallow', 'VB'), ('?', '.')]
Returns
list(tuple(str, str))Undocumented
def tag_sents(self, sentences): (source)

Tag multiple sentences.

Takes multiple sentences as a list where each sentence is a list of tokens.

Parameters
sentences:list(list(str))Input sentences to tag
Returns
list(list(tuple(str, str))Undocumented
def tokenize(self, text, properties=None): (source)

Tokenize a string of text.

>>> parser = CoreNLPParser(url='http://localhost:9000')
>>> text = 'Good muffins cost $3.88\nin New York.  Please buy me\ntwo of them.\nThanks.'
>>> list(parser.tokenize(text))
['Good', 'muffins', 'cost', '$', '3.88', 'in', 'New', 'York', '.', 'Please', 'buy', 'me', 'two', 'of', 'them', '.', 'Thanks', '.']
>>> s = "The colour of the wall is blue."
>>> list(
...     parser.tokenize(
...         'The colour of the wall is blue.',
...             properties={'tokenize.options': 'americanize=true'},
...     )
... )
['The', 'color', 'of', 'the', 'wall', 'is', 'blue', '.']
encoding = (source)

Undocumented

Undocumented

Undocumented

Undocumented