package yggdrasil_decision_forests.utils.proto

Get desktop application:
View/edit binary Protocol Buffers messages

Configuration for the generation of training and testing folds.

oneof generator
- FoldGenerator.TrainTest train_test = 1
- FoldGenerator.CrossValidation cross_validation = 2
- FoldGenerator.TestOnOtherDataset test_on_other_dataset = 4
- FoldGenerator.NoTraining no_training = 5
- FoldGenerator.PrecomputedCrossValidation precomputed_cross_validation = 6
optional int64 seed = 3
Seed used to control the fold generation. A same seed is guarantied to generate same folds for a given dataset.

Split the dataset into n folds (n=10 by default). Then, for each subset of n-1 folds, train a model and evaluate it on the remaining folds. This methods is called "cross-validation". Cross-validation is more expensive (n models need to be trained) than "train and test" but the results are more precise.

Used in: FoldGenerator

optional int32 num_folds = 1
If num_folds=0, a leave-one-out cross-validation is executed (i.e. "num_folds" is set to the number of examples in the dataset).
optional FoldGroup fold_group = 2
"fold_group_attribute" an attribute name that defines the "group" appurtenance of each example. If "fold_group_attribute" is specified, all the examples of the same group will appear in the same fold. In other words, each group will either be entirely used for training or for testing.
Next ID: 3

Specify the "fold group" of each example. If specified, all the examples of the same group will appear in the same fold. In other words, each group will either be entirely used for training or for testing. There should be at least as much groups as there are folds.

Used in: CrossValidation

optional string group_attribute = 1
Name of a categorical attribute that defines the "group" membership of each example.
Next ID: 2

Does not train the model and evaluate on the entire dataset. This solution only make sense when the candidate method are all defined as pre-computed predictions or pre-computed models.

Used in: FoldGenerator

(message has no fields)

Cross-validation of folds computed externally.

Used in: FoldGenerator

optional string fold_path = 1
Path ([type]:[path]) to a dataset containing numerical column called "fold_idx". "fold_idx[i]" in an integer in [0, num_folds) defining the folds of the i-th example.

Evaluate the candidate model on a separate dataset. The entire dataset specified in "TrainEvaluateCompareOptions" will be used for training. The entire dataset specified in "TestOnOtherDataset" will be used for testing.

Used in: FoldGenerator

optional string dataset_path = 1
Path ([type]:[path]) to the test dataset.

Split the dataset in two folds. The first part will be used for training. The second part will be used for evaluation. This method is commonly called "Train and test" evaluation. This method is fast (only one model is trained) but the results are noisy (both with training noise and testing noise).

Used in: FoldGenerator

optional float test_ratio = 1
Ratio of the dataset used for testing. The remaining examples are used for training.
Next ID: 2

Represents the (discrete) probability distribution of a random variable with natural (i.e. integer greater of equal to zero) support: counts[i]/sum is the probability of observation of i.

Used in: model.decision_tree.proto.LabelStatistics.Classification, model.decision_tree.proto.NodeClassifierOutput

repeated double counts = 1
[required]
optional double sum = 2
[required]

Used in: model.proto.Prediction.Classification, PartialDependencePlotSet.PartialDependencePlot.LabelAccumulator

repeated float counts = 1
[required]
optional float sum = 2
[required]

repeated int64 counts = 1
[required]
optional int64 sum = 2
[required]

Confusion matrix between two integer distributions.

Used in: metric.proto.EvaluationResults.Classification, model.gradient_boosted_trees.proto.TrainingLogs.Entry

repeated double counts = 1
Contains nrow x ncol elements. Low column indexed i.e. the second element is counts[1,0]. [required]
optional double sum = 2
[required]
optional int32 nrow = 3
[required]
optional int32 ncol = 4
[required]

Describe a 1d normal distribution.

Used in: model.decision_tree.proto.LabelStatistics.Regression, model.decision_tree.proto.LabelStatistics.RegressionWithHessian, model.decision_tree.proto.NodeRegressorOutput

optional double sum = 1
[required]
optional double sum_squares = 2
[required]
optional double count = 3
[required]

Message for the metrics required to compute a partial dependence plot for multiple features or sets of features. This message is also used to store Conditional Expectancy Plots.

Used in: model_analysis.proto.AnalysisResult

repeated PartialDependencePlotSet.PartialDependencePlot pdps = 1

Message for metrics required to compute a partial dependence plot for ONE feature or ONE set of features.

Used in: PartialDependencePlotSet

optional double num_observations = 1
repeated PartialDependencePlot.Bin pdp_bins = 3
repeated PartialDependencePlot.AttributeInfo attribute_info = 4
optional PartialDependencePlot.Type type = 5
The type of plot this represents.

Used in: PartialDependencePlot

optional int32 num_bins_per_input_feature = 1
If this PartialDependencePlot represents a set of 3 features, attribute_info.num_bins represents the number of bins in each of these feature spaces. Further, \product_i attribute_info[i].num_bins_per_input_feature = pdp_bins.size().
optional int32 attribute_idx = 2
repeated double num_observations_per_bins = 3
Distribution of the attribute for each of the bins.
repeated float numerical_boundaries = 4
Boundaries of the bins for numerical attributes.
optional AttributeInfo.Scale scale = 5
How to scale the axis when plotting this attribute. Only used for numerical attributes.

Used in: AttributeInfo

UNIFORM = 0
LOG = 1

Represents the metrics for a feature OR set of features at a particular value (Represented by attribute_values).

Used in: PartialDependencePlot

optional LabelAccumulator prediction = 1
optional LabelAccumulator ground_truth = 2
optional EvaluationAccumulator evaluation = 4
repeated dataset.proto.Example.Attribute center_input_feature_values = 3
The values used to represent the center of this bin. In case of a categorical feature, this stores the exact categorical value. In case of a numerical feature, this stores a value: (max_value - min_value)*(bin_number + 0.5)/num_bins + min_value.

Represent the accumulation of evaluation metrics.

Used in: Bin

oneof prediction_value
- double sum_squared_error = 1
  For regression.
- double num_correct_predictions = 2
  For classification.

Represents the "sum" of a set of labels, either predicted by the model, or the ground truth.

Used in: Bin

oneof prediction_value
- IntegerDistributionFloat classification_class_distribution = 1
- double sum_of_regression_predictions = 2
  sum_of_regression_predictions should be normalized with num_observations to obtain the mean prediction.
- double sum_of_ranking_predictions = 3
  sum_of_ranking_predictions should be normalized with num_observations to obtain the mean prediction.
- double sum_of_anomaly_detection_predictions = 4
  sum_of_anomaly_detection_predictions should be normalized with num_observations to obtain the mean prediction.

Used in: PartialDependencePlot

UNKNOWN = 0
PDP = 1
CEP = 2

Header attached to an exported sharded multi-bitmap.

optional int32 bits_by_elements = 1
These fields are the same as the fields defined in "ShardedMultiBitmap".
optional uint64 num_elements = 2
optional uint64 max_num_element_in_shard = 3
optional uint64 num_shards = 4

package yggdrasil_decision_forests.utils.proto

message FoldGenerator

oneof generator

FoldGenerator.TrainTest train_test = 1

FoldGenerator.CrossValidation cross_validation = 2

FoldGenerator.TestOnOtherDataset test_on_other_dataset = 4

FoldGenerator.NoTraining no_training = 5

FoldGenerator.PrecomputedCrossValidation precomputed_cross_validation = 6

optional int64 seed = 3

message FoldGenerator.CrossValidation

optional int32 num_folds = 1

optional FoldGroup fold_group = 2

message FoldGenerator.FoldGroup

optional string group_attribute = 1

message FoldGenerator.NoTraining

message FoldGenerator.PrecomputedCrossValidation

optional string fold_path = 1

message FoldGenerator.TestOnOtherDataset

optional string dataset_path = 1

message FoldGenerator.TrainTest

optional float test_ratio = 1

message IntegerDistributionDouble

repeated double counts = 1

optional double sum = 2

message IntegerDistributionFloat

repeated float counts = 1

optional float sum = 2

message IntegerDistributionInt64

repeated int64 counts = 1

optional int64 sum = 2

message IntegersConfusionMatrixDouble

repeated double counts = 1

optional double sum = 2

optional int32 nrow = 3

optional int32 ncol = 4

message NormalDistributionDouble

optional double sum = 1

optional double sum_squares = 2

optional double count = 3

message PartialDependencePlotSet

repeated PartialDependencePlotSet.PartialDependencePlot pdps = 1

message PartialDependencePlotSet.PartialDependencePlot

optional double num_observations = 1

repeated PartialDependencePlot.Bin pdp_bins = 3

repeated PartialDependencePlot.AttributeInfo attribute_info = 4

optional PartialDependencePlot.Type type = 5

message PartialDependencePlotSet.PartialDependencePlot.AttributeInfo

optional int32 num_bins_per_input_feature = 1

optional int32 attribute_idx = 2

repeated double num_observations_per_bins = 3

repeated float numerical_boundaries = 4

optional AttributeInfo.Scale scale = 5

enum PartialDependencePlotSet.PartialDependencePlot.AttributeInfo.Scale

UNIFORM = 0

LOG = 1

message PartialDependencePlotSet.PartialDependencePlot.Bin

optional LabelAccumulator prediction = 1

optional LabelAccumulator ground_truth = 2

optional EvaluationAccumulator evaluation = 4

repeated dataset.proto.Example.Attribute center_input_feature_values = 3

message PartialDependencePlotSet.PartialDependencePlot.EvaluationAccumulator

oneof prediction_value

double sum_squared_error = 1

double num_correct_predictions = 2

message PartialDependencePlotSet.PartialDependencePlot.LabelAccumulator

oneof prediction_value

IntegerDistributionFloat classification_class_distribution = 1

double sum_of_regression_predictions = 2

double sum_of_ranking_predictions = 3

double sum_of_anomaly_detection_predictions = 4

enum PartialDependencePlotSet.PartialDependencePlot.Type

UNKNOWN = 0

PDP = 1

CEP = 2

message ShardedMultiBitmapHeader

optional int32 bits_by_elements = 1

optional uint64 num_elements = 2

optional uint64 max_num_element_in_shard = 3

optional uint64 num_shards = 4