class GenericCoreNLPParser(ParserI, TokenizerI, TaggerI): (source)
Known subclasses: nltk.parse.corenlp.CoreNLPDependencyParser
, nltk.parse.corenlp.CoreNLPParser
Constructor: GenericCoreNLPParser(url, encoding, tagtype)
Interface to the CoreNLP Parser.
Method | __init__ |
Undocumented |
Method | api |
Undocumented |
Method | parse |
Parse multiple sentences. |
Method | parse |
Parse a piece of text. |
Method | raw |
Parse a sentence. |
Method | raw |
Parse multiple sentences. |
Method | raw |
Tag multiple sentences. |
Method | tag |
Tag a list of tokens. |
Method | tag |
Tag multiple sentences. |
Method | tokenize |
Tokenize a string of text. |
Instance Variable | encoding |
Undocumented |
Instance Variable | session |
Undocumented |
Instance Variable | tagtype |
Undocumented |
Instance Variable | url |
Undocumented |
Inherited from ParserI
:
Method | grammar |
No summary |
Method | parse |
When possible this list is sorted from most likely to least likely. |
Method | parse |
No summary |
Method | parse |
No summary |
Inherited from TokenizerI
(via ParserI
):
Method | span |
Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token. |
Method | span |
Apply self.span_tokenize() to each element of strings. I.e.: |
Method | tokenize |
Apply self.tokenize() to each element of strings. I.e.: |
Inherited from TaggerI
(via ParserI
, TokenizerI
):
Method | evaluate |
Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score. |
Method | _check |
Undocumented |
nltk.parse.api.ParserI.parse_sents
Parse multiple sentences.
Takes multiple sentences as a list where each sentence is a list of words. Each sentence will be automatically tagged with this CoreNLPParser instance's tagger.
If a whitespace exists inside a token, then the token will be treated as several tokens.
Parameters | |
sentences:list(list(str)) | Input sentences to parse |
*args | Undocumented |
**kwargs | Undocumented |
Returns | |
iter(iter(Tree)) | Undocumented |
Parse a piece of text.
The text might contain several sentences which will be split by CoreNLP.
Parameters | |
text | Undocumented |
*args | Undocumented |
str text | text to be split. |
**kwargs | Undocumented |
Returns | |
an iterable of syntactic structures. # TODO: should it be an iterable of iterables? |
Parse a sentence.
Takes a sentence as a string; before parsing, it will be automatically tokenized and tagged by the CoreNLP Parser.
Parameters | |
sentence:str | Input sentence to parse |
properties | Undocumented |
*args | Undocumented |
**kwargs | Undocumented |
Returns | |
iter(Tree) | Undocumented |
Parse multiple sentences.
Takes multiple sentences as a list of strings. Each sentence will be automatically tokenized and tagged.
Parameters | |
sentences:list(str) | Input sentences to parse. |
verbose | Undocumented |
properties | Undocumented |
*args | Undocumented |
**kwargs | Undocumented |
Returns | |
iter(iter(Tree)) | Undocumented |
Tag multiple sentences.
Takes multiple sentences as a list where each sentence is a string.
Parameters | |
sentences:list(str) | Input sentences to tag |
Returns | |
list(list(list(tuple(str, str))) | Undocumented |
nltk.tag.api.TaggerI.tag
Tag a list of tokens.
>>> parser = CoreNLPParser(url='http://localhost:9000', tagtype='ner') >>> tokens = 'Rami Eid is studying at Stony Brook University in NY'.split() >>> parser.tag(tokens) [('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'), ('at', 'O'), ('Stony', 'ORGANIZATION'), ('Brook', 'ORGANIZATION'), ('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'O')]
>>> parser = CoreNLPParser(url='http://localhost:9000', tagtype='pos') >>> tokens = "What is the airspeed of an unladen swallow ?".split() >>> parser.tag(tokens) [('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'), ('unladen', 'JJ'), ('swallow', 'VB'), ('?', '.')]
Returns | |
list(tuple(str, str)) | Undocumented |
nltk.tag.api.TaggerI.tag_sents
Tag multiple sentences.
Takes multiple sentences as a list where each sentence is a list of tokens.
Parameters | |
sentences:list(list(str)) | Input sentences to tag |
Returns | |
list(list(tuple(str, str)) | Undocumented |
nltk.tokenize.api.TokenizerI.tokenize
Tokenize a string of text.
>>> parser = CoreNLPParser(url='http://localhost:9000')
>>> text = 'Good muffins cost $3.88\nin New York. Please buy me\ntwo of them.\nThanks.' >>> list(parser.tokenize(text)) ['Good', 'muffins', 'cost', '$', '3.88', 'in', 'New', 'York', '.', 'Please', 'buy', 'me', 'two', 'of', 'them', '.', 'Thanks', '.']
>>> s = "The colour of the wall is blue." >>> list( ... parser.tokenize( ... 'The colour of the wall is blue.', ... properties={'tokenize.options': 'americanize=true'}, ... ) ... ) ['The', 'color', 'of', 'the', 'wall', 'is', 'blue', '.']