Proto commits in stanfordnlp/stanza

These 23 commits are when the Protocol Buffers files have changed:

Commit:8a6c543
Author:John Bauer
Committer:John Bauer

Update constituency evaluation to accommodate the per-tree f1 (this will be sent back via the proto in the next version of CoreNLP after 4.5.6)

The documentation is generated from this commit.

Commit:9c53cfa
Author:John Bauer
Committer:John Bauer

Clean up some whitespace in the protobuf definitions

Commit:7efd4fb
Author:John Bauer
Committer:John Bauer

Add the fields needed to send empty nodes in protobufs as part of an enhanced dependencies from UD

Commit:69a1c60
Author:John Bauer
Committer:John Bauer

Ssurgeon now passes the misc column for MWT as well as words to CoreNLP. Only does anything useful if there is a new CoreNLP release, but in the meantime it doesn't crash or throw away the whitespace markings, at least. Adds a commented out test of the mwt_misc column

Commit:fd8b524
Author:John Bauer
Committer:John Bauer

Copy two functionalities from an updated CoreNLP proto: send named egdes back as part of a semgrex search & add an Ssurgeon request/response

Commit:0987794
Author:John Bauer
Committer:John Bauer

Add an interface for the CoreNLP conversion from English constituencies to dependencies. Only works for English. Not currently unit tested (obviously tested during development) because it requires a new CoreNLP release first Return the doc after processing - makes it more pipelineable

Commit:17b8d03
Author:John Bauer
Committer:John Bauer

Update corenlp.proto with definitions that will connect to the Morphology annotator

Commit:f660b95
Author:John Bauer

Add the graphIndex and semgrexIndex from CoreNLP 4.5.0 to make the semgrex interface a bit more readable (hopefully)

Commit:fc68b55
Author:John Bauer
Committer:John Bauer

Add tsurgeon interface to the python/corenlp interface Includes a context manager to the tsurgeon Add a unit test to tsurgeon

Commit:f15c38b
Author:John Bauer
Committer:John Bauer

Add the kbestF1 field to the parser eval

Commit:07a63bf
Author:John Bauer
Committer:John Bauer

Add tsurgeon interface to the python/corenlp interface Includes a context manager to the tsurgeon Add a unit test to tsurgeon

Commit:2134ae8
Author:John Bauer
Committer:John Bauer

Add the kbestF1 field to the parser eval

Commit:9031802
Author:John Bauer
Committer:John Bauer

Constituency parser based on word embeddings to create Trees out of a sequence of words. This is a squash of what was originally a long list of changes See 2f7db846e14ce73ca95416172b1ba5ba512821f5 for a original sequence Primary methods are either top-down or in-order transition sequences, as per ths paper: In-Order Transition-based Constituent Parsing Jiangming Liu and Yue Zhang Parser eval interface which calls the CoreNLP parser eval Model is based on LSTMs. Includes a treebank evaluation request to CoreNLP via a protobuf Has options to use a variety of small modifications to the models. Constraints on the transitions hopefully prevent the parser from getting stuck. Allow either adadelta or sgd as optimizer Allow choice of relu or tanh for nonlinearity Includes a bunch of tests Move the constituency tests into their own directory Defaults are set to reasonable values for the WSJ PTB Lots of effort put into bulk operations instead of doing single transitions at a time Saves the optimizer state when saving a model. Makes the model much larger, but allows for restarting training from the same optimizer Also, a mode to remove the optimizer from a model (which shrinks it). Uses a mechanism similar to the original implementation to avoid too many "unary" transitions, eg an open immediately followed by a close. However, some training trees have too many unary transitions for the original limit=3 to be sufficient Charlm integration, including batching, although that didn't seem to help Also has some doc on things which didn't help

Commit:450041b
Author:John Bauer
Committer:John Bauer

Add protobuf for an enhancement request

Commit:2c528e8
Author:John Bauer

Updated proto file, including a tokensregex interface

Commit:48ee26d
Author:John Bauer
Committer:John Bauer

Python interface to the semgrex processor

Commit:8a90cf2
Author:John Bauer

Transfer a couple proto updates from corenlp

Commit:08c5de0
Author:John Bauer

Update a couple more uint32->int32 to be compatible with the CoreNLP use of these fields, where -1 can represent null

Commit:43ec748
Author:John Bauer

Update CoreNLP.proto to use int32 for some annotators which use that to signify 'not present'

Commit:1ab2a2e
Author:John Bauer

Update corenlp protocol buffer to the candidate version for corenlp 4.0.0

Commit:f15677b
Author:Sina

Update to 3.9.1

Commit:d05093e
Author:Arun Tejasvi Chaganty

Updated tests and MANIFEST

Commit:6755b5f
Author:Arun Tejasvi Chaganty

Initialized with protobuf and tests