package yggdrasil_decision_forests.model.random_forest.proto

Get desktop application:
View/edit binary Protocol Buffers messages

Header for the random forest model.

Next ID: 7

Used in: RandomForestSerializedModel

optional int32 num_node_shards = 1
Number of shards used to store the nodes.
optional int64 num_trees = 2
Number of trees.
optional bool winner_take_all_inference = 3
Whether the vote of individual trees are distributions or winner-take-all.
repeated OutOfBagTrainingEvaluations out_of_bag_evaluations = 4
Evaluation of the model, on the out-of-bag examples, during the training.
repeated proto.VariableImportance mean_decrease_in_accuracy = 5
Variable importance measures.
repeated proto.VariableImportance mean_increase_in_rmse = 6
optional string node_format = 7
Container used to store the trees' nodes.
optional int64 num_pruned_nodes = 8
Number of nodes trained and then pruned during the training. The classical random forest learning algorithm does not prune nodes.

Used in: RandomForestTrainingConfig

repeated int64 individual_tree_seeds = 1
Individual random seeds user to train the trees. Is specified, the number of seeds should be equal to "num_trees".

Next ID: 3

Used in: Header

optional int32 number_of_trees = 1
Number of trees available in the model when evaluated.
optional metric.proto.EvaluationResults evaluation = 2

optional Header header = 1

Training configuration for the Random Forest algorithm.

Next ID: 17

optional int32 num_trees = 1
Number of trees in the random forest.
optional decision_tree.proto.DecisionTreeTrainingConfig decision_tree = 2
Decision tree specific parameters.
optional bool winner_take_all_inference = 3
Whether the vote of individual trees are distributions or winner-take-all. With winner_take_all_inference=true, each tree cast a vote for a single label value. With winner_take_all_inference=false, each tree cast a weighted vote for each label values. The original random forest implementation uses winner_take_all_inference=true (default value). However, "winner_take_all_inference=false" often leads to better results and smaller models.
optional bool compute_oob_performances = 4
Computes and report the OOB performances of the model during training. The added computing cost is relatively small.
optional bool compute_oob_variable_importances = 5
Computes the importance of each variable (i.e. each input feature) during training. Computing the variable importance is expensive and can significantly slow down the training.
optional int32 num_oob_variable_importances_permutations = 6
Number of time the dataset is shuffled for each tree when computing the variable importances. Increasing this number can increase significantly the training time (if "compute_oob_variable_importances:true") and increase the stability of the oob variable importance metrics.
optional float oob_evaluation_interval_in_seconds = 7
The Out-of-bag evaluation is computed if one of the condition is true: - This is the last tree of the model. - The last OOB was computed more than "oob_evaluation_interval_in_seconds" ago. - This last OOB was computed more than "oob_evaluation_interval_in_trees" trees ago.
optional float oob_evaluation_interval_in_trees = 14
optional bool bootstrap_training_dataset = 8
If true, each tree is trained on a separate dataset sampled with replacement from the original dataset. If false, all the trees are trained on the same dataset. Note: If bootstrap_training_dataset:false, OOB metrics are not available. bootstrap_training_dataset:true is the default value for Random Forest. bootstrap_training_dataset:false can be used to simulate related decision forest algorithms (e.g. "Extremely randomized trees" https://link.springer.com/content/pdf/10.1007%2Fs10994-006-6226-1.pdf).
optional bool sampling_with_replacement = 15
If true, the training examples are sampled with replacement. If false, the training samples are sampled without replacement. Only used when "bootstrap_training_dataset=true". If false (sampling without replacement) and if "bootstrap_size_ratio=1" (default), all the examples are used to train all the trees (you probably do not want that).
optional float bootstrap_size_ratio = 9
Number of example in each bootstrap expressed as a ratio of the training dataset size.
optional bool adapt_bootstrap_size_ratio_for_maximum_training_duration = 11
If true, the "bootstrap_size_ratio" parameter will be adapted dynamically such that the "num_trees" will be trained in the "maximum_training_duration" time. "bootstrap_size_ratio" can only be reduced i.e. enabling this feature can only reduce the training time.
optional float min_adapted_subsample = 12
Maximum impact of the "adapt_bootstrap_size_ratio_for_maximum_training_duration" parameter.
optional int64 total_max_num_nodes = 13
Total maximum of nodes in the model. If specified, and if the total number of nodes is exceeded, the training stops and the forest is truncated.
optional string export_oob_prediction_path = 16
If set, and if "compute_oob_performances" is true, export the out-of-bag predictions of the model on the training dataset in the file specified by "export_oob_prediction_path". Note that "export_oob_prediction_path" is a typed-path e.g. a path with a format prefix. The writer implementation of the format should be linked in the binary. For example, to export the predictions to the csv or tfrecord+tfe format, you need to make sure that dataset:csv_example_writer or dataset:tf_example_io_tfrecord are respectively linked. Example: export_oob_prediction_path = "csv:/tmp/oob_predictions.csv"
optional Internal internal = 10
Fields used for low level and/or internal API. In most cases, the user should not care about this field.

package yggdrasil_decision_forests.model.random_forest.proto

message Header

optional int32 num_node_shards = 1

optional int64 num_trees = 2

optional bool winner_take_all_inference = 3

repeated OutOfBagTrainingEvaluations out_of_bag_evaluations = 4

repeated proto.VariableImportance mean_decrease_in_accuracy = 5

repeated proto.VariableImportance mean_increase_in_rmse = 6

optional string node_format = 7

optional int64 num_pruned_nodes = 8

message Internal

repeated int64 individual_tree_seeds = 1

message OutOfBagTrainingEvaluations

optional int32 number_of_trees = 1

optional metric.proto.EvaluationResults evaluation = 2

message RandomForestSerializedModel

optional Header header = 1

message RandomForestTrainingConfig

optional int32 num_trees = 1

optional decision_tree.proto.DecisionTreeTrainingConfig decision_tree = 2

optional bool winner_take_all_inference = 3

optional bool compute_oob_performances = 4

optional bool compute_oob_variable_importances = 5

optional int32 num_oob_variable_importances_permutations = 6

optional float oob_evaluation_interval_in_seconds = 7

optional float oob_evaluation_interval_in_trees = 14

optional bool bootstrap_training_dataset = 8

optional bool sampling_with_replacement = 15

optional float bootstrap_size_ratio = 9

optional bool adapt_bootstrap_size_ratio_for_maximum_training_duration = 11

optional float min_adapted_subsample = 12

optional int64 total_max_num_nodes = 13

optional string export_oob_prediction_path = 16

optional Internal internal = 10