package yggdrasil_decision_forests.dataset.proto

Get desktop application:
View/edit binary Protocol Buffers messages

Specification of a boolean column.

Used in: Column

optional int64 count_true = 1
Number of true values.
optional int64 count_false = 2
Number of false values.

Used in: ColumnGuide

optional int32 min_vocab_frequency = 1
Minimum frequency of an categorical value not to be replaced by the <RARE> special value.
optional int32 max_vocab_count = 2
Maximum number of unique categorical values. If more values are present, the less frequent values are considered <OOD>.
optional bool is_already_integerized = 3
If is_already_integerized=false, a dictionary is build for the feature. Even if the feature is an integer or a float. If is_already_integerized=true, the value is directly interpreted as an index and should follow the following convention: - The value should be greater or equal to -1. - The value -1 is the "missing value". - The value 0 is the "out-of-dictionary value". - Several YDF algorithms assume this is a "dense index" i.e. if the column is an input feature, it is best to have it being dense.
optional int64 number_of_already_integerized_values = 4
If "is_already_integerized=true" and if "number_of_already_integerized_values" is set, "number_of_already_integerized_values" is the number of unique values. Such attribute accepts values in [-1, number_of_already_integerized_values). Values outside of this range will be considered "out-of-vocabulary". Note that if the dataset used to infer the dataspec contains an example with a value > number_of_already_integerized_values, the example value will be used instead of "number_of_already_integerized_values".
optional CategoricalGuide.OverrideMostFrequentItem override_most_frequent_item = 5
If set, replaces the most_frequent_item item. The most frequent item is used by the global imputation algorithm to handle missing values. That is, missing values will be treated as the most frequent item. Overriding the most frequent item is only allowed on columns not containing any missing values.

Used in: CategoricalGuide

optional string str_value = 5
Overriding is only possible for non-integerized columns.

Specification of a categorical column.

Used in: Column

optional int64 most_frequent_value = 1
The most frequent value.
optional int64 number_of_unique_values = 2
The number of unique values (including the reserved OOD(=0) value). All the values should be 0 <= value < number_of_unique_values. The value "0" is reserved for the out-of-dictionary value. Therefore, in the case of a categorical column with two possible values "X" and "Y", the proto will be: number_of_unique_values = 3 is_already_integerized=false items { key: "OOD" value { index: 0 }} items { key: "X" value { index: 1 }} items { key: "Y" value { index: 2 }} Missing values are implicit and take index=-1. They don't need to be specified in "items".
optional int32 min_value_count = 3
Minimum frequency of a value not to be replaced by the <OOD> special value. Used when computing value dictionary.
optional int32 max_number_of_unique_values = 4
Maximum number of unique categorical values. If more values are present, the less frequent values are considered <OOD>. Used when computing value dictionary. If "max_number_of_unique_values" == -1, the items are not pruned.
optional bool is_already_integerized = 5
If true, values are interpreted directed as an integer. If false, values are indexed in the "items" dictionary.
map<string, CategoricalSpec.VocabValue> items = 7
Dictionary of values. Only available if is_already_integerized=false. In this case, items.size() is equal to number_of_unique_values.
optional bool offset_value_by_one_during_training = 8
If true, integer categorical values provided by the user have been offset by 1. Such pre-processing is done in TensorFlow Decision Forests. See "CATEGORICAL_INTEGER_OFFSET".

This is an alternative to the CategoricalSpec without using a map. This message is for internal use only. It is binary compatible to CategoricalSpec. This message may be removed at any point without warning.

repeated InternalCategoricalSpecWithoutMap.ItemsEntry items = 7

Used in: InternalCategoricalSpecWithoutMap

optional bytes key = 1
optional VocabValue value = 2

Possible value of a non integerized categorical, categorical set, or categorical list attribute.

Used in: CategoricalSpec, InternalCategoricalSpecWithoutMap.ItemsEntry, model.distributed_decision_tree.dataset_cache.proto.PartialColumnShardMetadata.CategoricalColumn, model.distributed_decision_tree.dataset_cache.proto.WorkerRequest.ConvertPartialToFinalRawData.CategoricalString

optional int64 index = 1
Index of the value.
optional int64 count = 2
Frequency of the value.

Definition of a column in a dataset.

Used in: DataSpecification, metric.proto.EvaluationResults

optional ColumnType type = 1
Type of data.
optional string name = 2
Column unique name.
optional bool is_manual_type = 3
If true, the type is set manually by the user (instead of been automatically detected). This field is purely used for debugging purpose and has no impact on the computation. Note that if a column guide matches this column, and if this column guide does not contain a type, is_manual_type is set to false (as if there were no column guide match).
optional Tokenizer tokenizer = 4
Tokenization. For non-integerized list or sets columns (numerical or categorical).
optional NumericalSpec numerical = 5
Data for numerical (simple, list or set) attribute types.
optional CategoricalSpec categorical = 6
Data for categorical (simple, list or set) attribute types.
optional int64 count_nas = 7
Number of NAs (i.e. not available) record when building the dataspec.
optional DiscretizedNumericalSpec discretized_numerical = 8
Numerical value stored as an index + a dictionary.
optional BooleanSpec boolean = 9
Data for boolean attribute types.
optional MultiValuesSpec multi_values = 10
For all the types defined as a collection of multiple values.
optional NumericalVectorSequenceSpec numerical_vector_sequence = 13
For data of type NUMERICAL_VECTOR_SEQUENCE.
optional bool is_unstacked = 11
Is the feature derived from unstacking a multi-dimensional dimension?
optional DType dtype = 12
Storage representation of a column. Internally, feature representation is determined by its semantic. For instance, a NUMERICAL feature is always stored as a float32. DTypes are used to record the feature representation fed to YDF, and then used for APIs without automatic casting.

Used in: DataSpecificationGuide

optional string column_name_pattern = 1
Regular expression on the column name.
optional ColumnType type = 2
Type of the column.
optional CategoricalGuide categorial = 3
optional NumericalGuide numerical = 4
optional TokenizerGuide tokenizer = 5
If "tokenizer" is specified, and if the dataset container can represent a list of token natively (i.e. list of strings e.g. tf.Example), the first string entry (if any) will be tokenized. If the attribute contains more than one entry, an error will be raised.
optional bool allow_multi_match = 6
If true, a column can be matched against multiple different "ColumnGuide" with the last ColumnGuide having higher priority. For example, it the "type" is set in two matching column guides, the type defined in the last column guide will be used. If false, an error will be raised if more than one column guide is matching a column.
optional DiscretizedNumericalGuide discretized_numerical = 7
optional bool ignore_column = 8
If true, matching columns are ignored and won't be in the dataspec.

Type of dataset columns.

Used in: Column, ColumnGuide, Unstacked

UNKNOWN = 0
NUMERICAL = 1
NUMERICAL_SET = 2
NUMERICAL_LIST = 3
CATEGORICAL = 4
CATEGORICAL_SET = 5
CATEGORICAL_LIST = 6
BOOLEAN = 7
STRING = 8
DISCRETIZED_NUMERICAL = 9
HASH = 10
NUMERICAL_VECTOR_SEQUENCE = 11
A numerical vector sequence value is a sequence (e.g. a list) of numerical vectors. A numerical vector is a sequence of floats. The number of vectors in a sequence can vary from one example to another. Some examples can have empty sequences.The length of vectors is fixed for all the vectors in a dataset (i.e., not just for the vectors in a sequence). Semantically, all the i-th value of all vectors are expected to represent the same type of data. An empty sequence is different than an unknown value sequence (i.e., there is a sequence but we don't know what it is). Numerical vector sequence can be used to represent multivariate time series or sequences of embeddings (such as the one in transformer architectures).

Storage representation of a column.

Used in: Column

DTYPE_INVALID = 0
DTYPE_INT8 = 1
DTYPE_INT16 = 2
DTYPE_INT32 = 3
DTYPE_INT64 = 4
DTYPE_UINT8 = 5
DTYPE_UINT16 = 6
DTYPE_UINT32 = 7
DTYPE_UINT64 = 8
DTYPE_FLOAT16 = 9
DTYPE_FLOAT32 = 10
DTYPE_FLOAT64 = 11
DTYPE_BOOL = 12
DTYPE_BYTES = 13

Specification of the columns of a dataset. List the available columns ( including their name, type, and extra information e.g. dictionaries).

Used in: model.distributed_decision_tree.dataset_cache.proto.WorkerRequest.SeparateDatasetColumns, model.distributed_gradient_boosted_trees.proto.WorkerWelcome, model.generic_worker.proto.Request.TrainModel, utils.model_analysis.proto.PredictionAnalysisResult, utils.model_analysis.proto.StandaloneAnalysisResult

repeated Column columns = 1
The columns.
optional int64 created_num_rows = 2
The number of rows of the dataset used to create this dataspec (if a dataset was used).
repeated Unstacked unstackeds = 3
Meta-data about features that were unstacked e.g. with the "unstack_numerical_set_as_numericals" control field.

Structure containing intermediary information for the computation of a DataSpecification.

repeated DataSpecificationAccumulator.Column columns = 1

Used in: DataSpecificationAccumulator

optional double kahan_sum = 1
Sum and sum of error for the Kahan summation. Used for numerical columns.
optional double kahan_sum_error = 2
optional double min_value = 3
optional double max_value = 4
optional double kahan_sum_of_square = 6
optional double kahan_sum_of_square_error = 7
map<fixed32, int32> discretized_numerical = 5
Mapping between float values (represented as an uint32) and the number of times this value was saw. Note: Map don't allow float indexed maps.

Configuration for the automated "inference" logic of the data specification (see header for the definition of data specification). For example, the DataSpecificationGuide allows to express the following: - The column called "feature_1" is NUMERICAL. - The columns matching the regex "num_feature_.*" are NUMERICAL. - Ignore the column called "feature_1". - Ignore the columns matching the regex "num_feature_.*". - Ignore the columns matching none of the set rules. - The column called "feature_1" is a CATEGORICAL_SET and should be tokenized by commas. - The column called "feature_1" is a CATEGORICAL and the categorical values seen less than 50 times should be ignored (considered out-of-bag). - The size of the CATEGORICAL and CATEGORICAL_SET column dictionaries should not have more than 1000 items. - Column that look BOOLEAN should be interpreted as NUMERICAL. - Use the first 100'000 record in the dataset to best infer the semantic of the columns.

Used in: example.proto.Request

repeated ColumnGuide column_guides = 1
Guide applied to one or a sub-set of columns according to a regular expression match.
optional ColumnGuide default_column_guide = 2
Default guide for all columns. Also apply to columns matched with "column_guides", but with a lower priority. For example, if an configuration option is set both in "default_column_guide" and "column_guides", the value is "column_guides" will be used.
optional bool ignore_columns_without_guides = 3
If true, columns that don't match any "column_guides" regular expression are ignored.
optional int64 max_num_scanned_rows_to_guess_type = 4
Maximum number of rows to scan to infer the column types. Set the value "-1" to use all rows (i.e. use the entire dataset). Note: The type inference logic is only used if the user does not specify the type manually.
optional bool detect_boolean_as_numerical = 5
If true, columns initially detected as BOOLEAN (i.e. only containing "0" and "1" values) will be detected as NUMERICAL.
optional bool detect_numerical_as_discretized_numerical = 6
Detects numerical values (i.e. NUMERICAL) as DISCRETIZED_NUMERICAL. DISCRETIZED_NUMERICAL values are discretized at loading time. Some algorithms (e.g. the YDF decision forest algorithms) will handle NUMERICAL and DISCRETIZED_NUMERICAL types differently. Generally, discretized columns are faster to train but can lead to sub-optimal models.
optional int64 max_num_scanned_rows_to_accumulate_statistics = 7
Maximum number of rows to scan to compute column statistics (e.g. dictionary, ratio of missing values, mean value). Set the value "-1" to use all rows (i.e. use the entire dataset).
optional bool unstack_numerical_set_as_numericals = 8
If true, unstack numerical sets are multiple numerical features. This operation is useful to consume multi-dimensional numerical vectors i.e. list of numerical values with always the same size and semantic per dimension.
optional bool ignore_unknown_type_columns = 9
Remove columns of unknown type. For example, if the column has no values (all the values are missing) and its type is not specified by the user.
optional bool allow_tokenization_for_inference_as_categorical_set = 10
Allow automatic inference of the CATEGORICAL_SET type by applying tokenization. If not set, the inference code will still set the type to CATEGORICAL_SET if the (default) column guide asks for it.

Supported dataset formats.

INVALID = 0
FORMAT_CSV = 1
FORMAT_TFE_TFRECORD = 5
FORMAT_TFE_TFRECORDV2 = 8
FORMAT_TFE_TFRECORD_COMPRESSED_V2 = 9
FORMAT_PARTIAL_DATASET_CACHE = 7
FORMAT_AVRO = 10

Used in: ColumnGuide

optional int64 maximum_num_bins = 1
optional int32 min_obs_in_bins = 2
Minimum number of examples in a bin.

Specification of a discretized numerical column. A "discretized numerical" value "i" is encoded as index (integer) between -1 (inclusive) and "n = boundaries.size()" (also inclusive). If i==-1, the value is missing. If i==0, the original numerical value is lower (strictly) than "boundaries.front()". If i==boundaries.size(), the original value is higher (non strictly) to "boundaries.back()". If i \in [1, boundaries.size()[, the original value is in between "boundaries[i-1]" and "boundaries[i]". Because encoding a numerical value into a discretized numerical value is lossy, the original numerical value cannot be recovered. In this case, the following logic is applied: If i==-1, the numercal value is "std::nan" (corresponding to a missing value). If i==0, the numerical value is "boundaries.front()-1". If i==boundaries.size(), the numerical value is "boundaries.back()+1". If i \in [1, boundaries.size()[, the numerical value is "(boundaries[i-1]+boundaries[i])/2".

Used in: Column

repeated float boundaries = 1
Boundaries in between the bins. The number of bins is boundaries.size() + 1.
optional int64 original_num_unique_values = 2
Number of unique numerical values before the discretization.
optional int64 maximum_num_bins = 3
Maximum number of bins (at construction time). // Defaults to 255 bins, that is 254 boundaries.
optional int32 min_obs_in_bins = 4
Minimum number of examples in a bin.

One Example (also called observation/record/example/sample).

Used in: utils.model_analysis.proto.PredictionAnalysisResult

repeated Example.Attribute attributes = 1
Attribute values indexed by the attribute index defined in the dataspec.
optional int64 example_idx = 2
Example index.

Attribute value.

Used in: Example, utils.proto.PartialDependencePlotSet.PartialDependencePlot.Bin

oneof type
- bool boolean = 1
- float numerical = 2
- int32 categorical = 3
- string text = 4
- CategoricalVector categorical_list = 5
- CategoricalVector categorical_set = 6
- NumericalVector numerical_list = 7
- NumericalVector numerical_set = 8
- int32 discretized_numerical = 9
  Note: This value will be loaded as a "DiscretizedNumericalIndex = uint16".
- uint64 hash = 10
- NumericalVectorSequence numerical_vector_sequence = 11

Value for multi-dimensional categorical attributes.

Used in: Attribute

repeated int32 values = 1

Value for multi-dimensional numerical attributes.

Used in: Attribute

repeated float values = 1

An ordered sequence of numerical vectors.

Used in: Attribute

repeated NumericalVectorSequence.Vector vectors = 1

Used in: NumericalVectorSequence

repeated float values = 1

Internal linked version of the weight definition. The attributes and values are indexed according to the dataspec.

Used in: model.proto.AbstractModel, model.proto.TrainingConfigLinking

optional int32 attribute_idx = 1
Attribute index used to compute the weight.
oneof type
- LinkedWeightDefinition.NumericalWeight numerical = 2
  Weight definition if the controlling attribute is a numerical attribute.
- LinkedWeightDefinition.CategoricalWeight categorical = 3
  Weight definition if the controlling attribute is a categorical attribute.

Used in: LinkedWeightDefinition

repeated float categorical_value_idx_2_weight = 1
Index of "categorical_mapping". Maps a weight value for each categorical attribute value. See the dataspec for the mapping attribute value string to attribute value index.

Used in: LinkedWeightDefinition

(message has no fields)

Specification for types with multiple values.

Used in: Column

optional int32 max_observed_size = 1
Maximum number of observed items.
optional int32 min_observed_size = 2
Minimum number of observed items.
Note: Depending on the type of the column, observations with more values than "max_observed_size" or less values than "min_observed_size" might still be valid.

Used in: ColumnGuide

(message has no fields)

Specification of a numerical column.

Used in: Column

optional double mean = 1
Mean value (excluding the NaN).
optional float min_value = 2
optional float max_value = 3
optional double standard_deviation = 4

Used in: Column

optional int32 vector_length = 1
The length of the vectors.
optional int64 count_values = 2
Number of value (i.e., float) seen.
optional int32 min_num_vectors = 3
Minimum and maximum number of vectors seen.
optional int32 max_num_vectors = 4

Options for the synthetic generation of dataset.

Next ID: 21

optional int32 num_examples = 1
Number of examples in the dataset.
optional string label_name = 2
Name of the label column.
optional string feature_name = 3
Name of the feature columns, with "{type}" being the short feature type (e.g. "num(erical)", "cat(egorical)") and "{index}" being the feature index (among other features of the same type).
optional int32 num_numerical = 4
Number of features by semantic. "num_categorical" and "num_categorical_set" are used each twice for the string and integer representations e.g. categorical_string, categorical_int.
optional int32 num_categorical = 5
optional int32 num_categorical_set = 6
optional int32 num_boolean = 7
optional int32 num_multidimensional_numerical = 18
optional int32 categorical_vocab_size = 8
Dictionary sizes.
optional int32 categorical_set_vocab_size = 9
optional int32 multidimensional_numerical_dim = 19
Number of dimensions of "multidimensional_numerical" features.
optional bool represent_numerical_as_integer = 20
If false, numerical values represented as float. If true, they are represented as integers.
optional bool zero_categorical_int_value_is_oov = 17
If true, the value zero (0) of categorical and categorical set values (both for features and labels) is used to represent a out-of-vocabulary value (and the first real value is 1). If false, zero (0) is a categorical value like others.
optional int32 categorical_set_mean_size = 10
Average number of items in a categorical set feature.
optional float missing_ratio = 11
Probability for a feature value to be missing.
optional float label_noise_ratio = 12
How much noise to inject in the label. The problem can be perfectly solved with "label_noise_ratio=0", and not be solved better than random for "label_noise_ratio=1" (is there are not other sources of noise).
optional int32 seed = 13
Seed used to initialize the random generator used to generate the dataset. If set to -1, the random generator is initialized using std::random_device.
optional int32 num_accumulators = 14
Number of accumulators. Accumulator are internal structures used to generate the dataset. Increasing the value will increase the "conditional independence" of the dataset i.e. having more tuples <FS1, FS2, X> such that "Label ⊥ FS1 | FS2=X" with FS1 and FS2 two sets of features. Decreasing the value will make the dataset more "naive independent" i.e. increasing the tendency of "P(Label Fi | Fj) == P(Label Fi) if j!=i". The value should be odd and in between 1 and the total number of features (i.e. sum "num_{numerical, categorical, ...}"). Even values will be rounded down. The exact use of the accumulators is described in "synthetic_dataset.h".
oneof task
The task represented by the labels.
- SyntheticDatasetOptions.Classification classification = 15
  Default.
- SyntheticDatasetOptions.Regression regression = 16
- SyntheticDatasetOptions.Ranking ranking = 21
optional int64 num_examples_per_shards = 22
Number of examples to inject in each shards. Requires for the dataset paths to be sharded (i.e. ends with @<number of shards>). Set to -1 to disable dataset sharding.

Used in: SyntheticDatasetOptions

optional int32 num_classes = 1
Number of label classes. 2 => binary classification.
optional bool store_label_as_str = 2
Is the label stored a string or an integer.

Used in: SyntheticDatasetOptions

optional string group_name = 1
Name of the column containing the group index. In document/query scoring, the group would be the queries.
optional int32 group_size = 2
Number of examples in each group. The last group might have less examples if num_examples % group_size != 0.

Used in: SyntheticDatasetOptions

(message has no fields)

Tokenization parameters.

Used in: Column, TokenizerGuide

optional Tokenizer.Splitter splitter = 1
How to convert a string into a list/set of symbols.
optional string separator = 2
Separator characters. Used if splitter=SEPARATOR.
optional string regex = 3
Splitting regular expression. Used if splitter=REGEX_MATCH.
optional bool to_lower_case = 4
Cast strings to lower case before tokenization.
optional Tokenizer.Grouping grouping = 5
Grouping of the tokens.

Used in: Tokenizer

optional bool unigrams = 1
optional bool bigrams = 2
optional bool trigrams = 3

Possible string tokenization algorithms.

Used in: Tokenizer

INVALID = 0
SEPARATOR = 1
Split a string according to the user specified separator.
REGEX_MATCH = 2
Split a string by extracting token using the user specified regular expression.
CHARACTER = 3
Split a string into individual characters. Does not remove spaces and non-printable characters.
NO_SPLITTING = 4
Never split a string. Useful if CATEGORICAL_SET features should be avoided.

Used in: ColumnGuide

optional Tokenizer tokenizer = 1

Information about unstacked column. An unstacked column is a multi-dimensional column (e.g. an embedding) that has been split into multiple scalar columns.

Used in: DataSpecification

optional string original_name = 1
Name of the column that was unstacked.
optional int32 begin_column_idx = 2
Index of the first column containing the unstacked feature.
optional int32 size = 3
Number of unstacked elements.
optional ColumnType type = 4
Type of the columns.

Used in: metric.proto.EvaluationOptions, model.proto.TrainingConfig

optional string attribute = 1
[Required] Name of the attribute that controls the weights of the examples.
oneof type
- WeightDefinition.NumericalWeight numerical = 2
  The attribute is interpreted as a numerical value.
- WeightDefinition.CategoricalWeight categorical = 3
  The attribute is interpreted as a categorical attribute. A weight is defined for each possible value.

Solve the following mapping to get the weight.

Used in: WeightDefinition

repeated CategoricalWeight.Item items = 1
Pair of categorical value and weight.

Used in: CategoricalWeight

optional string value = 1
[Required] A value to map to a corresponding weight.
optional float weight = 3
[Required] The weight.

The weight is directly the numerical value. Note that for Ranking problems, the ranking is per group and all weights of the same group should be identical.

Used in: WeightDefinition

(message has no fields)

package yggdrasil_decision_forests.dataset.proto

message BooleanSpec

optional int64 count_true = 1

optional int64 count_false = 2

message CategoricalGuide

optional int32 min_vocab_frequency = 1

optional int32 max_vocab_count = 2

optional bool is_already_integerized = 3

optional int64 number_of_already_integerized_values = 4

optional CategoricalGuide.OverrideMostFrequentItem override_most_frequent_item = 5

message CategoricalGuide.OverrideMostFrequentItem

optional string str_value = 5

message CategoricalSpec

optional int64 most_frequent_value = 1

optional int64 number_of_unique_values = 2

optional int32 min_value_count = 3

optional int32 max_number_of_unique_values = 4

optional bool is_already_integerized = 5

map<string, CategoricalSpec.VocabValue> items = 7

optional bool offset_value_by_one_during_training = 8

message CategoricalSpec.InternalCategoricalSpecWithoutMap

repeated InternalCategoricalSpecWithoutMap.ItemsEntry items = 7

message CategoricalSpec.InternalCategoricalSpecWithoutMap.ItemsEntry

optional bytes key = 1

optional VocabValue value = 2

message CategoricalSpec.VocabValue

optional int64 index = 1

optional int64 count = 2

message Column

optional ColumnType type = 1

optional string name = 2

optional bool is_manual_type = 3

optional Tokenizer tokenizer = 4

optional NumericalSpec numerical = 5

optional CategoricalSpec categorical = 6

optional int64 count_nas = 7

optional DiscretizedNumericalSpec discretized_numerical = 8

optional BooleanSpec boolean = 9

optional MultiValuesSpec multi_values = 10

optional NumericalVectorSequenceSpec numerical_vector_sequence = 13

optional bool is_unstacked = 11

optional DType dtype = 12

message ColumnGuide

optional string column_name_pattern = 1

optional ColumnType type = 2

optional CategoricalGuide categorial = 3

optional NumericalGuide numerical = 4

optional TokenizerGuide tokenizer = 5

optional bool allow_multi_match = 6

optional DiscretizedNumericalGuide discretized_numerical = 7

optional bool ignore_column = 8

enum ColumnType

UNKNOWN = 0

NUMERICAL = 1

NUMERICAL_SET = 2

NUMERICAL_LIST = 3

CATEGORICAL = 4

CATEGORICAL_SET = 5

CATEGORICAL_LIST = 6

BOOLEAN = 7

STRING = 8

DISCRETIZED_NUMERICAL = 9

HASH = 10

NUMERICAL_VECTOR_SEQUENCE = 11

enum DType

DTYPE_INVALID = 0

DTYPE_INT8 = 1

DTYPE_INT16 = 2

DTYPE_INT32 = 3

DTYPE_INT64 = 4

DTYPE_UINT8 = 5

DTYPE_UINT16 = 6

DTYPE_UINT32 = 7

DTYPE_UINT64 = 8

DTYPE_FLOAT16 = 9

DTYPE_FLOAT32 = 10

DTYPE_FLOAT64 = 11

DTYPE_BOOL = 12

DTYPE_BYTES = 13

message DataSpecification