package syntaxnet

Get desktop application:
View/edit binary Protocol Buffers messages

Affix table entry, for serialization of the affix tables.

required string type = 1
The type of affix table, as a string.
required int32 max_length = 2
The maximum affix length.
repeated AffixTableEntry.AffixEntry affix = 3
The list of affixes, in order of affix ID.

Nested message for serializing a single affix.

required string form = 1
The affix as a string.
required int32 length = 2
The length of the affix (this is non-trivial to compute due to UTF-8).
required int32 shorter_id = 3
The ID of the affix that is one character shorter, or -1 if none exists.

An alternative analysis of tokens in the document. The repeated fields are indexed relative to the beginning of a sentence. Fields not represented in the alternative analysis are assumed to be unchanged. Currently only alternatives for tags, categories and (labeled) dependency heads are supported. Each repeated field should either have length=0 or length=number of tokens.

Used in: KBestSyntaxAnalysesForSentence, KBestSyntaxAnalysesForToken

repeated int32 head = 1
Head of this token in the dependency tree: the id of the token which has an arc going to this one. If it is the root token of a sentence, then it is set to -1.
repeated string tag = 2
Part-of-speech tag for token.
repeated string category = 3
Coarse-grained word category for token.
repeated string label = 4
Label for dependency relation between this token and its head.
optional double score = 5
The score of this analysis, where bigger values typically indicate better quality, but there are no guarantees and there is also no pre-defined range.

Descriptor for feature extractor.

repeated FeatureFunctionDescriptor feature = 1
Top-level feature function for extractor.

Descriptor for feature function.

Used in: FeatureExtractorDescriptor

required string type = 1
Feature function type.
optional string name = 2
Feature function name.
optional int32 argument = 3
Default argument for feature function.
repeated Parameter parameter = 4
Named parameters for feature descriptor.
repeated FeatureFunctionDescriptor feature = 7
Nested sub-feature function descriptors.

A list of alternative (k-best) syntax analyses, grouped by sentences.

repeated KBestSyntaxAnalysesForSentence sentence = 1
Alternative analyses for each sentence. Sentences are listed in the order visited by a SentenceIterator.
repeated KBestSyntaxAnalysesForToken token = 2
Alternative analyses for each token.

A list of alternative (k-best) analyses for a sentence spanning from a start token index to an end token index. The alternative analyses are ordered by decreasing model score from best to worst. The first analysis is the 1-best analysis, which is typically also stored in the document tokens.

Used in: KBestSyntaxAnalyses

optional int32 start = 1
First token of sentence.
optional int32 end = 2
Last token of sentence.
repeated AlternativeTokenAnalysis token_analysis = 3
K-best analyses for the tokens in this sentence. All of the analyses in the list have the same "type"; e.g., k-best taggings, k-best {tagging+parse}s, etc. Note also that the type of analysis stored in this list can change depending on where we are in the document processing pipeline; e.g., may initially be taggings, and then switch to parses. The first token_analysis would be the 1-best analysis, which is typically also stored in the document. Note: some post-processors will update the document's syntax trees, but will leave these unchanged.

A list of scored alternative (k-best) analyses for a particular token. These are all distinct from each other and ordered by decreasing model score. The first is the 1-best analysis, which may or may not match the document tokens depending on how the k-best analyses are selected.

Used in: KBestSyntaxAnalyses

repeated AlternativeTokenAnalysis token_analysis = 3
All token analyses in this repeated field refer to the same token. Each alternative analysis will contain a single entry for repeated fields such as head, tag, category and label.

Used in: FeatureFunctionDescriptor

optional string name = 1
optional string value = 2

A Sentence contains the raw text contents of a sentence, as well as an analysis.

optional string docid = 1
Identifier for document.
optional string text = 2
Raw text contents of the sentence.
repeated Token token = 3
Tokenization of the sentence.

A sparse set of features. If using SparseStringToIdTransformer, description is required and id should be omitted; otherwise, id is required and description optional. id, weight, and description fields are all aligned if present (ie, any of these that are non-empty should have the same # items). If weight is omitted, 1.0 is used.

repeated uint64 id = 1
repeated float weight = 2
repeated string description = 3

Serializable representation of a string=>string mapping.

repeated StringToStringPair pair = 1
Key=>value pairs.

Serializable representation of a string=>string pair.

Used in: StringToStringMap

required string key = 1
String representing the key.
required string value = 2
String representing the value.

Task input descriptor.

Used in: TaskSpec

required string name = 1
Name of input resource.
optional string creator = 2
Name of stage responsible of creating this resource.
repeated string file_format = 3
File format for resource.
repeated string record_format = 4
Record format for resource.
optional bool multi_file = 5
Is this resource multi-file?
repeated group syntaxnet.TaskInput.Part = 6

An input can consist of multiple file sets.

Used in: TaskInput

optional string file_pattern = 7
File pattern for file set.
optional string file_format = 8
File format for file set.
optional string record_format = 9
Record format for file set.

Task output descriptor.

Used in: TaskSpec

required string name = 1
Name of output resource.
optional string file_format = 2
File format for output resource.
optional string record_format = 3
Record format for output resource.
optional int32 shards = 4
Number of shards in output. If it is different from zero this output is sharded. If the number of shards is set to -1 this means that the output is sharded, but the number of shard is unknown. The files are then named 'base-*-of-*'.
optional string file_base = 5
Base file name for output resource. If this is not set by the task component it is set to a default value by the workflow engine.
optional string file_extension = 6
Optional extension added to the file name.

A task specification is used for describing executing parameters.

optional string task_name = 1
Name of task.
optional string task_type = 2
Workflow task type.
repeated group syntaxnet.TaskSpec.Parameter = 3
repeated TaskInput input = 6
Task inputs.
repeated TaskOutput output = 7
Task outputs.

Task parameters.

Used in: TaskSpec

required string name = 4
optional string value = 5

A document token marks a span of bytes in the document text as a token or word.

Used in: Sentence

required string word = 1
Token word form.
required int32 start = 2
Start position of token in text.
required int32 end = 3
End position of token in text. Gives index of last byte, not one past the last byte. If token came from lexer, excludes any trailing HTML tags.
optional int32 head = 4
Head of this token in the dependency tree: the id of the token which has an arc going to this one. If it is the root token of a sentence, then it is set to -1.
optional string tag = 5
Part-of-speech tag for token.
optional string category = 6
Coarse-grained word category for token.
optional string label = 7
Label for dependency relation between this token and its head.
optional Token.BreakLevel break_level = 8

Break level for tokens that indicates how it was separated from the previous token in the text.

Used in: Token

NO_BREAK = 0
No separation between tokens.
SPACE_BREAK = 1
Tokens separated by space.
LINE_BREAK = 2
Tokens separated by line break.
SENTENCE_BREAK = 3
Tokens separated by sentence break.

A light-weight proto to store vectors in binary format.

required bytes token = 1
can be word or phrase, or URL, etc.
optional int64 count = 3
If available, raw count of this token in the training corpus.
optional TokenEmbedding.Vector vector = 2

Used in: TokenEmbedding

repeated float values = 1

Stores information about the morphology of a token.

repeated TokenMorphology.Attribute attribute = 3
This attribute field is designated to hold a single disambiguated analysis.

Morphology is represented by a set of attribute values.

Used in: TokenMorphology

required string name = 1
required string value = 2

package syntaxnet

message AffixTableEntry

required string type = 1

required int32 max_length = 2

repeated AffixTableEntry.AffixEntry affix = 3

message AffixTableEntry.AffixEntry

required string form = 1

required int32 length = 2

required int32 shorter_id = 3

message AlternativeTokenAnalysis

repeated int32 head = 1

repeated string tag = 2

repeated string category = 3

repeated string label = 4

optional double score = 5

message FeatureExtractorDescriptor

repeated FeatureFunctionDescriptor feature = 1

message FeatureFunctionDescriptor

required string type = 1

optional string name = 2

optional int32 argument = 3

repeated Parameter parameter = 4

repeated FeatureFunctionDescriptor feature = 7

message KBestSyntaxAnalyses

repeated KBestSyntaxAnalysesForSentence sentence = 1

repeated KBestSyntaxAnalysesForToken token = 2

message KBestSyntaxAnalysesForSentence

optional int32 start = 1

optional int32 end = 2

repeated AlternativeTokenAnalysis token_analysis = 3

message KBestSyntaxAnalysesForToken

repeated AlternativeTokenAnalysis token_analysis = 3

message Parameter

optional string name = 1

optional string value = 2

message Sentence

optional string docid = 1

optional string text = 2

repeated Token token = 3

message SparseFeatures

repeated uint64 id = 1

repeated float weight = 2

repeated string description = 3

message StringToStringMap

repeated StringToStringPair pair = 1

message StringToStringPair

required string key = 1

required string value = 2

message TaskInput

required string name = 1

optional string creator = 2

repeated string file_format = 3

repeated string record_format = 4

optional bool multi_file = 5

repeated group syntaxnet.TaskInput.Part = 6

message TaskInput.Part

optional string file_pattern = 7

optional string file_format = 8

optional string record_format = 9

message TaskOutput

required string name = 1

optional string file_format = 2

optional string record_format = 3

optional int32 shards = 4

optional string file_base = 5

optional string file_extension = 6

message TaskSpec

optional string task_name = 1

optional string task_type = 2

repeated group syntaxnet.TaskSpec.Parameter = 3

repeated TaskInput input = 6

repeated TaskOutput output = 7

message TaskSpec.Parameter

required string name = 4

optional string value = 5

message Token

required string word = 1

required int32 start = 2

required int32 end = 3

optional int32 head = 4