package yggdrasil_decision_forests.model.distributed_gradient_boosted_trees.proto

Get desktop application:
View/edit binary Protocol Buffers messages

Meta-data of a checkpoint.

optional decision_tree.proto.LabelStatistics label_statistics = 1
optional int32 num_shards = 2
optional PartialEvaluationAggregator validation_aggregator = 3

Training configuration for the Distributed Gradient Boosted Trees algorithm.

optional gradient_boosted_trees.proto.GradientBoostedTreesTrainingConfig gbt = 1
Classical training configuration for a GBT.
optional distributed_decision_tree.dataset_cache.proto.CreateDatasetCacheConfig create_cache = 2
Hyper-parameters for the creation of the dataset cache.
optional distributed_decision_tree.dataset_cache.proto.DatasetCacheReaderOptions dataset_reader_options = 3
How to read the dataset cache.
optional bool worker_logs = 4
If true, workers will print training logs.
optional distributed_decision_tree.proto.LoadBalancerOptions load_balancer = 8
Dynamic balancing of workload in between workers in the case the speed of workers is not uniform or not constant.
optional int32 checkpoint_interval_trees = 5
Internal in between the creation og checkpoints. If one of the worker or the manager is rescheduled, all the training from the last checkpoint is lost. On the other hand, creating a checkpoint is expensive. Value "-1" disables the interval.
optional double checkpoint_interval_seconds = 6
Default to 10 minutes.
optional float ratio_evaluation_workers = 9
Ratio of workers used for evaluation. The remaining workers are used for training. If no validation dataset is available, all the workers are used for training independently of the value of "ratio_evaluation_workers". Validation and training are running concurrently. Increase this value if the validation takes more time than the training (High average duration of the"EndIter" stage; see the training logs).
optional DistributedGradientBoostedTreesTrainingConfig.Internal internal = 7

Used in: DistributedGradientBoostedTreesTrainingConfig

optional bool simulate_worker_failure = 1
If true, the workers will simulate failures to test the checkpoint during training. The training will eventually complete.
optional bool duplicate_computation_on_all_workers = 2
If true, all the workers are running all the splits i.e. the same amount of computation. This option can be used to benchmark and detect slow workers. Note that each worker will have a full copy of the dataset i.e. the dataset is not distributed.

Evaluation. Can be partial i.e. on a subset of a dataset.

Used in: PartialEvaluationAggregator.Item, WorkerResult.CreateEvaluationCheckpoint, WorkerResult.EndIter

optional float loss = 1
repeated float metrics = 2
The order and semantic of the metrics is defined by the loss implementation.
optional double sum_weights = 3
optional int64 num_examples = 4
optional int32 iter_idx = 5

Used in: Checkpoint

optional int32 num_fragments = 1
Number of evaluation fragments to make a full evaluation.
map<int32, PartialEvaluationAggregator.Item> items = 2
Evaluations indexed by "iter_idx".

Used in: PartialEvaluationAggregator

optional int32 num_fragments = 2
optional Evaluation evaluation = 3

Request message of the workers. Unless expressed, the messages are designed to be send from the manager to one of the workers.

oneof type
- WorkerRequest.GetLabelStatistics get_label_statistics = 1
  Computes the statistics of the labels e.g. number of element of each class for classification. Worker type: Trainer
- WorkerRequest.SetInitialPredictions set_initial_predictions = 2
  Sets the initial predictions (also call bias) of the model. Worker type: Trainer & Evaluator
- WorkerRequest.StartNewIter start_new_iter = 3
  Starts the training of a new iteration e.g. starts the training of a new tree if one tree is trained at each iteration. The workers return the statistics of the weak model labels. Worker type: Trainer
- WorkerRequest.FindSplits find_splits = 4
  Finds the highest scoring splits for each of the model nodes in the current tree. Worker type: Trainer
- WorkerRequest.EvaluateSplits evaluate_splits = 5
  Each worker will evaluate the split (on each examples) based on the features it owns. Worker type: Trainer
- WorkerRequest.ShareSplits share_splits = 6
  Share the evaluation split values in between the workers. Once the split are sharded, update the node map, tree structure and label statistics. Worker type: Trainer
- WorkerRequest.GetSplitValue get_split_value = 7
  Request a subset of split values. This message is only used in between workers. Worker type: Trainer
- WorkerRequest.EndIter end_iter = 8
  Finalize the current iteration. Worker type: Trainer & Evaluator
- WorkerRequest.RestoreCheckpoint restore_checkpoint = 9
  Restore an existing checkpoint. Worker type: Trainer & Evaluator
- WorkerRequest.CreateCheckpoint create_checkpoint = 10
  Create the training worker side checkpoint. Possibly, only create part of a checkpoint (for distributed checkpoint creation). Worker type: Trainer
- WorkerRequest.StartTraining start_training = 11
  First message send to the worker by the manager. Will be sent both for new or resumed training. Worker type: Trainer
- WorkerRequest.CreateEvaluationCheckpoint create_evaluation_checkpoint = 17
  Create an evaluation worker side checkpoint. Worker type: Evaluator
optional int64 request_id = 12
optional WorkerRequest.UpdateOwnedFeatures owned_features = 13
If set, the worker is to make sure the following (and only the following) features are loaded in RAM (in case features are loaded in RAM) as they are used in the computation. Those values can be different from the features in the welcome message (which are the initial features owned by the worker).
optional WorkerRequest.FutureOwnedFeatures future_owned_features = 14
Features not currently used in requests, but that will be used (or stopped to be used) in the future. The worker is expected to load these features in the background (i.e. independently of the core computation).

Used in: WorkerRequest

optional int64 begin_example_idx = 1
Range of examples to export. Note: The checkpoint only contains the prediction accumulator of the checkpoint.
optional int64 end_example_idx = 2
optional int32 shard_idx = 3
Used by the manager to keep track of the shards.

Used in: WorkerRequest

optional string checkpoint_dir = 1

Used in: WorkerRequest

optional int32 iter_idx = 1
optional bool compute_training_loss = 2
If true, the worker is expected to return the training loss.
repeated EndIter.Tree new_trees = 3
Newly learned tree. Only available to evaluation workers.
optional bool synchronous_validation = 4
If true, the evaluation worker is expected to return the validation evaluation immediately (i.e. not in the next iteration).

Used in: EndIter

repeated decision_tree.proto.Node nodes = 1

Used in: WorkerRequest

repeated distributed_decision_tree.proto.SplitPerOpenNode split_per_weak_model = 1

Used in: WorkerRequest

repeated FindSplits.FeaturePerNode features_per_weak_models = 1
List of features to test per weak learner and open nodes.

Used in: FeaturePerNode

repeated int32 features = 1

Used in: FindSplits

repeated FeatureList features_per_node = 1

Used in: WorkerRequest

repeated int32 load_features = 1
repeated int32 unload_features = 2

Used in: WorkerRequest

(message has no fields)

Used in: WorkerRequest

repeated distributed_decision_tree.proto.SplitSharingPlan.RequestItem.Split splits = 1

Used in: WorkerRequest

optional int32 iter_idx = 2
optional int32 num_shards = 3
optional int32 num_weak_models = 4
optional string checkpoint_dir = 5

Used in: WorkerRequest

optional decision_tree.proto.LabelStatistics label_statistics = 1

Used in: WorkerRequest

optional distributed_decision_tree.proto.SplitSharingPlan.Request request = 1

Used in: WorkerRequest

optional int32 iter_idx = 1
Index of the iteration.
optional string iter_uid = 2
Unique identifier of the iteration. If the manager is rescheduled, a same iteration index can be started multiple time. However, the UID will change.
optional int64 seed = 3
Seed used to initialize the random generator.

Used in: WorkerRequest

(message has no fields)

Used in: WorkerRequest

repeated int32 features = 1

Result message of the worker.

oneof type
Each WorkerRequest leads to a WorkerResult with the same set attribute. Keep the same indexing as in "WorkerRequest" for debugging purpose.
- WorkerResult.GetLabelStatistics get_label_statistics = 1
- WorkerResult.SetInitialPredictions set_initial_predictions = 2
- WorkerResult.StartNewIter start_new_iter = 3
- WorkerResult.FindSplits find_splits = 4
- WorkerResult.EvaluateSplits evaluate_splits = 5
- WorkerResult.ShareSplits share_splits = 6
- WorkerResult.GetSplitValue get_split_value = 7
- WorkerResult.EndIter end_iter = 8
- WorkerResult.RestoreCheckpoint restore_checkpoint = 9
- WorkerResult.CreateCheckpoint create_checkpoint = 10
- WorkerResult.StartTraining start_training = 11
- WorkerResult.CreateEvaluationCheckpoint create_evaluation_checkpoint = 17
optional bool request_restart_iter = 12
If true, indicates that the worker is missing information to complete the request and continue the training of the tree. This situation is caused by a rescheduling. This message is only possible for the messages related to the training an individual tree as snapshots are made in between trees.
optional int64 request_id = 13
optional int32 worker_idx = 14
optional double runtime_seconds = 15
Duration of the computation expressed in seconds. If the worker restart during the computation, the duration of the last execution is used.
optional bool preloading_work_in_progress = 16
True if the pre-loading is currently being done.

Used in: WorkerResult

optional int32 shard_idx = 1
optional string path = 2

Used in: WorkerResult

repeated Evaluation validations = 1
Any pending validation evaluation.

Used in: WorkerResult

optional Evaluation training = 1
repeated Evaluation validations = 2
Because validation evaluation is asynchronous, there can be multiple validation evaluation corresponding to several previous iterations.

Used in: WorkerResult

(message has no fields)

Used in: WorkerResult

repeated distributed_decision_tree.proto.SplitPerOpenNode split_per_weak_model = 1
One for each weak models.

Used in: WorkerResult

optional decision_tree.proto.LabelStatistics label_statistics = 1

Used in: WorkerResult

optional int32 source_worker = 1
repeated GetSplitValue.SplitEvaluationPerWeakModel evaluation_per_weak_model = 2

Used in: GetSplitValue

repeated bytes evaluation_per_open_node = 1

Used in: WorkerResult

(message has no fields)

Used in: WorkerResult

(message has no fields)

Used in: WorkerResult

(message has no fields)

Used in: WorkerResult

repeated decision_tree.proto.LabelStatistics label_statistics = 1
One for each weak models.

Used in: WorkerResult

optional int32 num_loaded_features = 1
If specified, the worker loaded the dataset in memory.
optional double feature_loading_time_seconds = 2

"Welcome" proto message of the worker. The welcome message is received as an argument of the "Setup" method. All the workers have the same welcome message.

optional string work_directory = 1
Location used by the manager and the workers to store intermediate data.
optional string cache_path = 2
Location of the dataset cache i.e. the dataset indexed for fast training.
repeated WorkerWelcome.FeatureList owned_features = 3
List of features owned by each training worker. "owned_features[i].features" are the features owned by the i-th worker.
optional proto.TrainingConfig train_config = 4
Classical yggdrasil training configuration.
optional proto.TrainingConfigLinking train_config_linking = 5
optional proto.DeploymentConfig deployment_config = 6
optional dataset.proto.DataSpecification dataspec = 7
optional int32 num_train_workers = 8
Number of training workers. A fraction of the workers will only be used for training while another will only be used for evaluation. Training worker have index "WorkerIdx() < num_train_workers" while evaluation worker have index "WorkerIdx() >= num_train_workers".
repeated string validation_dataset_per_worker = 9
Validation dataset for each evaluation worker.

Used in: WorkerWelcome

repeated int32 features = 1

package yggdrasil_decision_forests.model.distributed_gradient_boosted_trees.proto

message Checkpoint

optional decision_tree.proto.LabelStatistics label_statistics = 1

optional int32 num_shards = 2

optional PartialEvaluationAggregator validation_aggregator = 3

message DistributedGradientBoostedTreesTrainingConfig

optional gradient_boosted_trees.proto.GradientBoostedTreesTrainingConfig gbt = 1

optional distributed_decision_tree.dataset_cache.proto.CreateDatasetCacheConfig create_cache = 2

optional distributed_decision_tree.dataset_cache.proto.DatasetCacheReaderOptions dataset_reader_options = 3

optional bool worker_logs = 4

optional distributed_decision_tree.proto.LoadBalancerOptions load_balancer = 8

optional int32 checkpoint_interval_trees = 5

optional double checkpoint_interval_seconds = 6

optional float ratio_evaluation_workers = 9

optional DistributedGradientBoostedTreesTrainingConfig.Internal internal = 7

message DistributedGradientBoostedTreesTrainingConfig.Internal

optional bool simulate_worker_failure = 1

optional bool duplicate_computation_on_all_workers = 2

message Evaluation

optional float loss = 1

repeated float metrics = 2

optional double sum_weights = 3

optional int64 num_examples = 4

optional int32 iter_idx = 5

message PartialEvaluationAggregator

optional int32 num_fragments = 1

map<int32, PartialEvaluationAggregator.Item> items = 2

message PartialEvaluationAggregator.Item

optional int32 num_fragments = 2

optional Evaluation evaluation = 3

message WorkerRequest

oneof type

WorkerRequest.GetLabelStatistics get_label_statistics = 1

WorkerRequest.SetInitialPredictions set_initial_predictions = 2

WorkerRequest.StartNewIter start_new_iter = 3

WorkerRequest.FindSplits find_splits = 4

WorkerRequest.EvaluateSplits evaluate_splits = 5

WorkerRequest.ShareSplits share_splits = 6

WorkerRequest.GetSplitValue get_split_value = 7

WorkerRequest.EndIter end_iter = 8

WorkerRequest.RestoreCheckpoint restore_checkpoint = 9

WorkerRequest.CreateCheckpoint create_checkpoint = 10

WorkerRequest.StartTraining start_training = 11

WorkerRequest.CreateEvaluationCheckpoint create_evaluation_checkpoint = 17

optional int64 request_id = 12

optional WorkerRequest.UpdateOwnedFeatures owned_features = 13

optional WorkerRequest.FutureOwnedFeatures future_owned_features = 14

message WorkerRequest.CreateCheckpoint

optional int64 begin_example_idx = 1

optional int64 end_example_idx = 2

optional int32 shard_idx = 3

message WorkerRequest.CreateEvaluationCheckpoint

optional string checkpoint_dir = 1

message WorkerRequest.EndIter

optional int32 iter_idx = 1

optional bool compute_training_loss = 2

repeated EndIter.Tree new_trees = 3

optional bool synchronous_validation = 4

message WorkerRequest.EndIter.Tree

repeated decision_tree.proto.Node nodes = 1

message WorkerRequest.EvaluateSplits

repeated distributed_decision_tree.proto.SplitPerOpenNode split_per_weak_model = 1

message WorkerRequest.FindSplits

repeated FindSplits.FeaturePerNode features_per_weak_models = 1

message WorkerRequest.FindSplits.FeatureList

repeated int32 features = 1

message WorkerRequest.FindSplits.FeaturePerNode

repeated FeatureList features_per_node = 1

message WorkerRequest.FutureOwnedFeatures

repeated int32 load_features = 1

repeated int32 unload_features = 2

message WorkerRequest.GetLabelStatistics

message WorkerRequest.GetSplitValue

repeated distributed_decision_tree.proto.SplitSharingPlan.RequestItem.Split splits = 1

message WorkerRequest.RestoreCheckpoint

optional int32 iter_idx = 2

optional int32 num_shards = 3

optional int32 num_weak_models = 4

optional string checkpoint_dir = 5

message WorkerRequest.SetInitialPredictions