package learning.adanets.phoenix.meta.proto.distillation_spec

optional int32 num_ramp_trials = 1
Number of trials to hit maximum_lambda. See: https://goto.google.com/churn-onepager
optional float minimum_lambda = 2
The minimum and maximum lambda allowed to balance the label loss and the distillation loss.
optional float maximum_lambda = 3

Next id: 6

optional DistillationSpec.DistillationType distillation_type = 1
optional AdaptivelyBalancingLossesSpec balance_losses_spec = 5
Used when distillation_loss_type is ADAPTIVELY_BALANCE_LOSSES
optional int32 minimal_pool_size = 2
Start distilling starting trial id `minimal_pool_size`. If distillation and intermixed ensemble search are specified at the same time, this field is ignored, and distillation happens immediately after an ensemble search trial. If distillation and nonadaptive or adaptive or residual ensemble search are specified for the same run, the minimal_pool_size for each should be different. Otherwise, distillation will be ignored.
optional float temperature = 3
The temperature of the softmax, when SOFTMAX is the DistillationType. output = softmax(logits / temperature)

UNKNOWN_DISTILLATION_TYPE = 0
MSE_LOGITS = 1
Mean square error between the teacher logits and the student logits. Labels are not used in this distillation mode.
MSE_SOFTMAX = 2
Mean square error between the teacher predictions (softmax over its logits) and the student logits. (Don't use with regression). Labels are not used in this distillation mode.
CROSS_ENTROPY = 3
Cross entropy loss between teacher predictions (softmax over its logits) and the student logits. (Don't use with regression). Labels are not used in this distillation mode.
ADAPTIVELY_BALANCE_LOSSES = 5
Incrementally rely on cross entropy loss from distilling the teacher. For more info, please see: https://goto.google.com/churn-onepager

message AdaptivelyBalancingLossesSpec