Get desktop application:
View/edit binary Protocol Buffers messages
A coreference chain. These fields are not *really* optional. CoreNLP will crash without them.
Used in:
Used in:
the second element of position
A request for converting constituency trees to dependency graphs
The result of using the CoreNLP dependency converter. One graph per tree
Used in:
A protobuf which allows to pass in a document with basic dependencies to be converted to enhanced
The expected value of this is a regex which matches relative pronouns
A dependency graph representation.
Used in:
, , , , , ,optional: if this graph message is not part of a larger context, the tokens will help reconstruct the actual sentence
The values in this field will index directly into the node list This is useful so that additional information such as emptyIndex can be considered without having to pass it around a second time
Used in:
Used in:
A document; that is, the equivalent of an Annotation.
Used in:
,* A peculiar field, for the corner case when a Document is serialized without any sentences. Otherwise
* This field is for entity mentions across the document.
used to differentiate between null and empty list
* xml information
* coref mentions for entire document *
A representation of an entity in a relation. This corresponds to the EntityMention, and more broadly the ExtractionObject classes.
Used in:
,inherited from ExtractionObject
Implicit uint32 sentence @see implicit in sentence
A protobuf for calling the java constituency parser evaluator from elsewhere
Used in:
repeated so you can send in kbest parses, if your parser handles that note that this already includes a score field
keep track of the individual tree F1 scores
A version of ParseTree with a flattened structure so that deep trees don't exceed the protobuf stack depth
Used in:
, , , ,Used in:
Used in:
An enumeration for the valid languages allowed in CoreNLP
Used in:
,A map from integers to strings. Used, minimally, in the CoNLLU featurizer
A map from strings to strings. Used, minimally, in the CoNLLU featurizer
Used in:
Used in:
,Sent in Morphology requests - a stream of sentences with tagged words
Used in:
Sent back from the Morphology request - the words and their tags
Used in:
An NER mention in the text
Used in:
,The seven informative Natural Logic relations
Used in:
A Natural Logic operator
Used in:
A syntactic parse tree, with scores.
Used in:
The polarity of a word, according to Natural Logic
Used in:
A quotation marker in text
Used in:
,A representation of a relation, mirroring RelationMention
Used in:
inherited from ExtractionObject
Implicit uint32 sentence @see implicit in sentence
An OpenIE relation triple. Created by the openie annotator.
Used in:
The surface form of the subject
The surface form of the relation (required)
The surface form of the object
The [optional] confidence of the extraction
The tokens comprising the subject of the triple
The tokens comprising the relation of the triple
The tokens comprising the object of the triple
The dependency graph fragment for this triple
If true, this expresses an implicit tmod relation
If true, this relation string is missing a 'be' prefix
If true, this relation string is missing a 'be' suffix
If true, this relation string is missing a 'of' prefix
Used in:
A message for requesting a semgrex Each sentence stores information about the tokens making up the corresponding graph An alternative would have been to use the existing Document or Sentence classes, but the problem with that is it would be ambiguous which dependency object to use.
Used in:
The response from running a semgrex If you pass in M semgrex expressions and N dependency graphs, this returns MxN nested results. Each SemgrexResult can match multiple times in one graph You may want to send multiple semgrexes per query because translating large numbers of dependency graphs to protobufs will be expensive, so doing several queries at once will save time
Used in:
Used in:
when processing multiple dependency graphs at once, which dependency graph this applies to indexed from 0
index of the semgrex expression this match applies to indexed from 0
Used in:
Used in:
Used in:
Used in:
The serialized version of a CoreMap representing a sentence.
Used in:
The OpenIE triples in the sentence
The KBP triples in this sentence
The entailed sentences, by natural logic
The entailed clauses, by natural logic
Only needed if we're only saving the sentence.
Fields set by other annotators in CoreNLP
Useful when storing sentences (e.g. ForEach)
date of section
section index for this sentence's section
name of section
author of section
doc id
is this sentence in an xml quote in a post
check if there are entity mentions
check if there are KBP triples
check if there are OpenIE triples
quote stuff
the quote annotator can soometimes add merged sentences
speaker stuff
The speaker speaking this sentence
The type of speaker speaking this sentence
An entailed sentence fragment. Created by the openie annotator.
Used in:
An enumeration of valid sentiment values for the sentiment classifier.
Used in:
A Span of text
Used in:
Used in:
A message for processing an Ssurgeon Each sentence stores information about the tokens making up the corresponding graph An alternative would have been to use the existing Document or Sentence classes, but the problem with that is it would be ambiguous which dependency object to use. Another problem is that if the intent is to use multiple graphs from a Sentence, then edits to the nodes of one graph would show up in the nodes of the other graph (same backing CoreLabels) and the operations themselves may not have the intended effect. The Ssurgeon is composed of two pieces, the semgrex and the ssurgeon operations, along with some optional documentation.
Used in:
Used in:
A Timex object, representing a temporal expression (TIMe EXpression) These fields are not *really* optional. CoreNLP will crash without them.
Used in:
,The serialized version of a Token (a CoreLabel).
Used in:
, , , ,Fields set by the default annotators [new CoreNLP(new Properties())]
the word's gloss (post-tokenization)
The word's part of speech tag
The word's 'value', (e.g., parse tree node)
The word's 'category' (e.g., parse tree node)
The whitespace/xml before the token
The whitespace/xml after the token
The original text for this token
The word's NER tag
The word's coarse NER tag
The word's fine-grained NER tag
listing of probs
The word's normalized NER tag
The word's lemma
The character offset begin, in the document
The character offset end, in the document
The utterance tag used in dcoref
The speaker speaking this word
The type of speaker speaking this word
The begin index of, e.g., a span
The begin index of, e.g., a span
The begin index of the token
The end index of the token
The time this word refers to
Used by clean xml annotator
Used by clean xml annotator
The [primary] cluster id for this token
A temporary annotation which is occasionally left in
optional string projectedCategory = 25; // The syntactic category of the maximal constituent headed by the word. Not used anywhere, so deleted.
The index of the head word of this word.
If this is an operator, which one is it and what is its scope (as per Natural Logic)?
The polarity of this word, according to Natural Logic
The polarity of this word, either "up", "down", or "flat"
The span of a leaf node of a tree
The final sentiment of the sentence
The index of the quotation this token refers to
The coarse POS tag (used to store the UPOS tag)
Fields set by other annotators in CoreNLP
gender annotation (machine reading)
true case type of token
true case gloss of token
Chinese character info
Arabic character info
Section info
French tokens have parents
mention index info
mwt stuff
setting this to a map might be nice, but there are a couple issues for one, there can be values with no key for another, it's a pain to correctly parse, since different treebanks can have different standards for how to write out the misc field
number info
Most serialized annotations will not have this Some code paths may not correctly process this if serialized, since many places will read the index off the position in a sentence In particular, deserializing a Document using ProtobufAnnotationSerializer will clobber any index value But Semgrex and Ssurgeon in particular need a way to pass around nodes where the node's index is not strictly 1, 2, 3, ... thanks to the empty nodes in UD treebanks such as English EWT or Estonian EWT (not related to each other)
The index of a token in a document, including the sentence index and the offset.
Used in:
It's possible to send in a whole document, but we only care about the Sentences and Tokens
The result will be a nested structure: repeated PatternMatch, one for each pattern each PatternMatch has a repeated Match, which tells you which sentence matched and where
Used in:
Used in:
Used in:
A protobuf for running Tsurgeon operations on constituency trees
Used in:
The results of the Tsurgeon operation