package yggdrasil_decision_forests.dataset.proto

Mouse Melon logoGet desktop application:
View/edit binary Protocol Buffers messages

message BooleanSpec

data_spec.proto:232

Specification of a boolean column.

Used in: Column

message CategoricalGuide

data_spec.proto:419

Used in: ColumnGuide

message CategoricalGuide.OverrideMostFrequentItem

data_spec.proto:455

Used in: CategoricalGuide

message CategoricalSpec

data_spec.proto:152

Specification of a categorical column.

Used in: Column

message CategoricalSpec.InternalCategoricalSpecWithoutMap

data_spec.proto:201

This is an alternative to the CategoricalSpec without using a map. This message is for internal use only. It is binary compatible to CategoricalSpec. This message may be removed at any point without warning.

message CategoricalSpec.InternalCategoricalSpecWithoutMap.ItemsEntry

data_spec.proto:203

Used in: InternalCategoricalSpecWithoutMap

message CategoricalSpec.VocabValue

data_spec.proto:187

Possible value of a non integerized categorical, categorical set, or categorical list attribute.

Used in: CategoricalSpec, InternalCategoricalSpecWithoutMap.ItemsEntry, model.distributed_decision_tree.dataset_cache.proto.PartialColumnShardMetadata.CategoricalColumn, model.distributed_decision_tree.dataset_cache.proto.WorkerRequest.ConvertPartialToFinalRawData.CategoricalString

message Column

data_spec.proto:90

Definition of a column in a dataset.

Used in: DataSpecification, metric.proto.EvaluationResults

message ColumnGuide

data_spec.proto:396

Used in: DataSpecificationGuide

enum ColumnType

data_spec.proto:63

Type of dataset columns.

Used in: Column, ColumnGuide, Unstacked

enum DType

data_spec.proto:130

Storage representation of a column.

Used in: Column

message DataSpecification

data_spec.proto:51

Specification of the columns of a dataset. List the available columns ( including their name, type, and extra information e.g. dictionaries).

Used in: model.distributed_decision_tree.dataset_cache.proto.WorkerRequest.SeparateDatasetColumns, model.distributed_gradient_boosted_trees.proto.WorkerWelcome, model.generic_worker.proto.Request.TrainModel, utils.model_analysis.proto.PredictionAnalysisResult, utils.model_analysis.proto.StandaloneAnalysisResult

message DataSpecificationAccumulator

data_spec.proto:475

Structure containing intermediary information for the computation of a DataSpecification.

message DataSpecificationAccumulator.Column

data_spec.proto:476

Used in: DataSpecificationAccumulator

message DataSpecificationGuide

data_spec.proto:349

Configuration for the automated "inference" logic of the data specification (see header for the definition of data specification). For example, the DataSpecificationGuide allows to express the following: - The column called "feature_1" is NUMERICAL. - The columns matching the regex "num_feature_.*" are NUMERICAL. - Ignore the column called "feature_1". - Ignore the columns matching the regex "num_feature_.*". - Ignore the columns matching none of the set rules. - The column called "feature_1" is a CATEGORICAL_SET and should be tokenized by commas. - The column called "feature_1" is a CATEGORICAL and the categorical values seen less than 50 times should be ignored (considered out-of-bag). - The size of the CATEGORICAL and CATEGORICAL_SET column dictionaries should not have more than 1000 items. - Column that look BOOLEAN should be interpreted as NUMERICAL. - Use the first 100'000 record in the dataset to best infer the semantic of the columns.

Used in: example.proto.Request

enum DatasetFormat

formats.proto:23

Supported dataset formats.

message DiscretizedNumericalGuide

data_spec.proto:467

Used in: ColumnGuide

message DiscretizedNumericalSpec

data_spec.proto:269

Specification of a discretized numerical column. A "discretized numerical" value "i" is encoded as index (integer) between -1 (inclusive) and "n = boundaries.size()" (also inclusive). If i==-1, the value is missing. If i==0, the original numerical value is lower (strictly) than "boundaries.front()". If i==boundaries.size(), the original value is higher (non strictly) to "boundaries.back()". If i \in [1, boundaries.size()[, the original value is in between "boundaries[i-1]" and "boundaries[i]". Because encoding a numerical value into a discretized numerical value is lossy, the original numerical value cannot be recovered. In this case, the following logic is applied: If i==-1, the numercal value is "std::nan" (corresponding to a missing value). If i==0, the numerical value is "boundaries.front()-1". If i==boundaries.size(), the numerical value is "boundaries.back()+1". If i \in [1, boundaries.size()[, the numerical value is "(boundaries[i-1]+boundaries[i])/2".

Used in: Column

message Example

example.proto:29

One Example (also called observation/record/example/sample).

Used in: utils.model_analysis.proto.PredictionAnalysisResult

message Example.Attribute

example.proto:47

Attribute value.

Used in: Example, utils.proto.PartialDependencePlotSet.PartialDependencePlot.Bin

message Example.CategoricalVector

example.proto:31

Value for multi-dimensional categorical attributes.

Used in: Attribute

message Example.NumericalVector

example.proto:35

Value for multi-dimensional numerical attributes.

Used in: Attribute

message Example.NumericalVectorSequence

example.proto:39

An ordered sequence of numerical vectors.

Used in: Attribute

message Example.NumericalVectorSequence.Vector

example.proto:40

Used in: NumericalVectorSequence

message LinkedWeightDefinition

weight.proto:60

Internal linked version of the weight definition. The attributes and values are indexed according to the dataspec.

Used in: model.proto.AbstractModel, model.proto.TrainingConfigLinking

message LinkedWeightDefinition.CategoricalWeight

weight.proto:74

Used in: LinkedWeightDefinition

message LinkedWeightDefinition.NumericalWeight

weight.proto:72

Used in: LinkedWeightDefinition

(message has no fields)

message MultiValuesSpec

data_spec.proto:221

Specification for types with multiple values.

Used in: Column

message NumericalGuide

data_spec.proto:461

Used in: ColumnGuide

(message has no fields)

message NumericalSpec

data_spec.proto:212

Specification of a numerical column.

Used in: Column

message NumericalVectorSequenceSpec

data_spec.proto:239

Used in: Column

message SyntheticDatasetOptions

synthetic_dataset.proto:21

Options for the synthetic generation of dataset.

Next ID: 21

message SyntheticDatasetOptions.Classification

synthetic_dataset.proto:104

Used in: SyntheticDatasetOptions

message SyntheticDatasetOptions.Ranking

synthetic_dataset.proto:114

Used in: SyntheticDatasetOptions

message SyntheticDatasetOptions.Regression

synthetic_dataset.proto:112

Used in: SyntheticDatasetOptions

(message has no fields)

message Tokenizer

data_spec.proto:283

Tokenization parameters.

Used in: Column, TokenizerGuide

message Tokenizer.Grouping

data_spec.proto:311

Used in: Tokenizer

enum Tokenizer.Splitter

data_spec.proto:296

Possible string tokenization algorithms.

Used in: Tokenizer

message TokenizerGuide

data_spec.proto:463

Used in: ColumnGuide

message Unstacked

data_spec.proto:321

Information about unstacked column. An unstacked column is a multi-dimensional column (e.g. an embedding) that has been split into multiple scalar columns.

Used in: DataSpecification

message WeightDefinition

weight.proto:28

Used in: metric.proto.EvaluationOptions, model.proto.TrainingConfig

message WeightDefinition.CategoricalWeight

weight.proto:46

Solve the following mapping to get the weight.

Used in: WeightDefinition

message WeightDefinition.CategoricalWeight.Item

weight.proto:49

Used in: CategoricalWeight

message WeightDefinition.NumericalWeight

weight.proto:43

The weight is directly the numerical value. Note that for Ranking problems, the ranking is per group and all weights of the same group should be identical.

Used in: WeightDefinition

(message has no fields)