package learning.adanets.phoenix.proto.ensembling_spec

Ensembling spec. Details in: go/phx-ensemble-search

optional bool freeze_previous = 1
When ensembling, all architectures previously trained, and now being tried in the ensemble are frozen if this variable is set to true. Obviously, new architectures (which were not trained in previous trials) will not be frozen.
optional EnsemblingSpec.CombiningType combining_type = 2
optional EnsemblingSpec.EnsembleSearchType ensemble_search_type = 3
oneof search_spec
- EnsemblingSpec.NonadaptiveSearchSpec nonadaptive_search = 4
- EnsemblingSpec.AdaptiveSearchSpec adaptive_search = 5
- EnsemblingSpec.IntermixedSearchSpec intermixed_search = 6
optional int32 no_train_speedup = 7
This number is added to the train steps when there is no training ops For example, in an average ensemble.

This message is used for both ADAPTIVE_ENSEMBLE_SEARCH and RESIDUAL_ENSEMBLE_SEARCH as there spec is the same.

optional int32 increase_width_every = 1
Increase width every `increase_width_every` trials
optional int32 minimal_pool_size = 2
When distillation and ensemble search are specified for the same run, `minimal_pool_size` determines which phase of the run should be for distillation and which should be for ensemble search. Ex. if the distillation pool size is 100 and the adaptive pool size is 200, then distillation will happen on trials [100, 200), an adaptive ensemble search will happen on trials [200, end).

The type of ensemble.

UNKNOWN_ENSEMBLING_TYPE = 0
WEIGHTED_ENSEMBLE = 1
Weighted ensemble: uses mixure weight on top of various learners logits.
AVERAGE_ENSEMBLE = 2
Average ensemble: average all the logits of various learners.

The type of search for the ensemble.

UNKNOWN_ENSEMBLE_SEARCH = 0
ADAPTIVE_ENSEMBLE_SEARCH = 1
Adaptive: After T trials, we select the best architecture, A. During the next T trials (ie, from Trial T+1 to Trial 2T), we search for an architecture A' that works well with A. In the next set of trials (ie, from Trial 2T+1 to Trial 3), we search for an architecture A'' that works well with both A and A', and so on. In this option, the new architectures (A', A", ...) are trained directly on the labels (as opposed to RESIDUAL_ENSEMBLE_SEARCH).
NONADAPTIVE_ENSEMBLE_SEARCH = 2
NonAdaptive: First, we create a pool of good architectures. From this pool, we select the top performing candidates. Then, we form random groups of these canddiates to see which architectures complement each other.
RESIDUAL_ENSEMBLE_SEARCH = 3
Residual: This case is identical to Adaptive, except the learners are trained from the ensemble loss (and not directly from their own loss against the labels).
INTERMIXED_NONADAPTIVE_ENSEMBLE_SEARCH = 4
Intermixed: This case is similar to NonAdaptive ensembling. After a pool of candidates is formed, we combine candidates randomly into groups of size w, where w is specified by the user. However, unlike NonApative ensembling, this algorithm works cyclically. On every kth trial, it will try to form an ensemble. (Past ensembling trials are excluded from the pool of potential candidates.)

optional int32 width = 1
How many towers to ensemble together? Should be bigger than 1.
optional int32 try_ensembling_every = 2
Every 5 (default value) trials will try an architecture that is an ensembling of candidates.
optional int32 num_trials_to_consider = 3
When ensembling, will choose `width` towers from the best performing `num_trials_to_consider`. Must be greater than 1.

optional int32 width = 1
How many towers to ensemble together? Should be bigger than 1.
optional int32 minimal_pool_size = 2
Start ensembling starting trial id `minimal_pool_size`
optional int32 num_trials_to_consider = 3
When ensembling will chose `width` towers from the best performing `num_trials_to_consider` after finishing `minimal_pool_size` trials Must be greater than 1.

message EnsemblingSpec