Get desktop application:
View/edit binary Protocol Buffers messages
Used in:
Number of trials to hit maximum_lambda. See: https://goto.google.com/churn-onepager
The minimum and maximum lambda allowed to balance the label loss and the distillation loss.
Next id: 6
Used in:
Used when distillation_loss_type is ADAPTIVELY_BALANCE_LOSSES
Start distilling starting trial id `minimal_pool_size`. If distillation and intermixed ensemble search are specified at the same time, this field is ignored, and distillation happens immediately after an ensemble search trial. If distillation and nonadaptive or adaptive or residual ensemble search are specified for the same run, the minimal_pool_size for each should be different. Otherwise, distillation will be ignored.
The temperature of the softmax, when SOFTMAX is the DistillationType. output = softmax(logits / temperature)
Used in:
Mean square error between the teacher logits and the student logits. Labels are not used in this distillation mode.
Mean square error between the teacher predictions (softmax over its logits) and the student logits. (Don't use with regression). Labels are not used in this distillation mode.
Cross entropy loss between teacher predictions (softmax over its logits) and the student logits. (Don't use with regression). Labels are not used in this distillation mode.
Incrementally rely on cross entropy loss from distilling the teacher. For more info, please see: https://goto.google.com/churn-onepager