class GenericCoreNLPParser(ParserI, TokenizerI, TaggerI): (source)
Known subclasses: nltk.parse.corenlp.CoreNLPDependencyParser, nltk.parse.corenlp.CoreNLPParser
Constructor: GenericCoreNLPParser(url, encoding, tagtype)
Interface to the CoreNLP Parser.
| Method | __init__ |
Undocumented |
| Method | api |
Undocumented |
| Method | parse |
Parse multiple sentences. |
| Method | parse |
Parse a piece of text. |
| Method | raw |
Parse a sentence. |
| Method | raw |
Parse multiple sentences. |
| Method | raw |
Tag multiple sentences. |
| Method | tag |
Tag a list of tokens. |
| Method | tag |
Tag multiple sentences. |
| Method | tokenize |
Tokenize a string of text. |
| Instance Variable | encoding |
Undocumented |
| Instance Variable | session |
Undocumented |
| Instance Variable | tagtype |
Undocumented |
| Instance Variable | url |
Undocumented |
Inherited from ParserI:
| Method | grammar |
No summary |
| Method | parse |
When possible this list is sorted from most likely to least likely. |
| Method | parse |
No summary |
| Method | parse |
No summary |
Inherited from TokenizerI (via ParserI):
| Method | span |
Identify the tokens using integer offsets (start_i, end_i), where s[start_i:end_i] is the corresponding token. |
| Method | span |
Apply self.span_tokenize() to each element of strings. I.e.: |
| Method | tokenize |
Apply self.tokenize() to each element of strings. I.e.: |
Inherited from TaggerI (via ParserI, TokenizerI):
| Method | evaluate |
Score the accuracy of the tagger against the gold standard. Strip the tags from the gold standard text, retag it using the tagger, then compute the accuracy score. |
| Method | _check |
Undocumented |
nltk.parse.api.ParserI.parse_sentsParse multiple sentences.
Takes multiple sentences as a list where each sentence is a list of words. Each sentence will be automatically tagged with this CoreNLPParser instance's tagger.
If a whitespace exists inside a token, then the token will be treated as several tokens.
| Parameters | |
| sentences:list(list(str)) | Input sentences to parse |
| *args | Undocumented |
| **kwargs | Undocumented |
| Returns | |
| iter(iter(Tree)) | Undocumented |
Parse a piece of text.
The text might contain several sentences which will be split by CoreNLP.
| Parameters | |
| text | Undocumented |
| *args | Undocumented |
| str text | text to be split. |
| **kwargs | Undocumented |
| Returns | |
| an iterable of syntactic structures. # TODO: should it be an iterable of iterables? | |
Parse a sentence.
Takes a sentence as a string; before parsing, it will be automatically tokenized and tagged by the CoreNLP Parser.
| Parameters | |
| sentence:str | Input sentence to parse |
| properties | Undocumented |
| *args | Undocumented |
| **kwargs | Undocumented |
| Returns | |
| iter(Tree) | Undocumented |
Parse multiple sentences.
Takes multiple sentences as a list of strings. Each sentence will be automatically tokenized and tagged.
| Parameters | |
| sentences:list(str) | Input sentences to parse. |
| verbose | Undocumented |
| properties | Undocumented |
| *args | Undocumented |
| **kwargs | Undocumented |
| Returns | |
| iter(iter(Tree)) | Undocumented |
Tag multiple sentences.
Takes multiple sentences as a list where each sentence is a string.
| Parameters | |
| sentences:list(str) | Input sentences to tag |
| Returns | |
| list(list(list(tuple(str, str))) | Undocumented |
nltk.tag.api.TaggerI.tagTag a list of tokens.
>>> parser = CoreNLPParser(url='http://localhost:9000', tagtype='ner') >>> tokens = 'Rami Eid is studying at Stony Brook University in NY'.split() >>> parser.tag(tokens) [('Rami', 'PERSON'), ('Eid', 'PERSON'), ('is', 'O'), ('studying', 'O'), ('at', 'O'), ('Stony', 'ORGANIZATION'), ('Brook', 'ORGANIZATION'), ('University', 'ORGANIZATION'), ('in', 'O'), ('NY', 'O')]
>>> parser = CoreNLPParser(url='http://localhost:9000', tagtype='pos') >>> tokens = "What is the airspeed of an unladen swallow ?".split() >>> parser.tag(tokens) [('What', 'WP'), ('is', 'VBZ'), ('the', 'DT'), ('airspeed', 'NN'), ('of', 'IN'), ('an', 'DT'), ('unladen', 'JJ'), ('swallow', 'VB'), ('?', '.')]
| Returns | |
| list(tuple(str, str)) | Undocumented |
nltk.tag.api.TaggerI.tag_sentsTag multiple sentences.
Takes multiple sentences as a list where each sentence is a list of tokens.
| Parameters | |
| sentences:list(list(str)) | Input sentences to tag |
| Returns | |
| list(list(tuple(str, str)) | Undocumented |
nltk.tokenize.api.TokenizerI.tokenizeTokenize a string of text.
>>> parser = CoreNLPParser(url='http://localhost:9000')
>>> text = 'Good muffins cost $3.88\nin New York. Please buy me\ntwo of them.\nThanks.' >>> list(parser.tokenize(text)) ['Good', 'muffins', 'cost', '$', '3.88', 'in', 'New', 'York', '.', 'Please', 'buy', 'me', 'two', 'of', 'them', '.', 'Thanks', '.']
>>> s = "The colour of the wall is blue." >>> list( ... parser.tokenize( ... 'The colour of the wall is blue.', ... properties={'tokenize.options': 'americanize=true'}, ... ) ... ) ['The', 'color', 'of', 'the', 'wall', 'is', 'blue', '.']