Get desktop application:
View/edit binary Protocol Buffers messages
Next id: 2
Used in:
Architecture of the various towers in the model.
Used in:
If true Phoenix will apply (step) exponential decay in some of the trials The number of steps can be specified in the fields below.
The minimal number of times to apply decay when applying learning rate exponential decay.
The maximal number of times to apply decay when applying learning rate exponential decay.
minimum rate for exponential learning rate decay
maximum rate for exponential learning rate decay
If true, The trials will apply gradient clipping.
Minimal norm allowed when clipping.
Maximal norm allowed when clipping.
Allow l2 regularization on trainable variables.
Minimal rate (lambda) for the l2 regularization
Maximal rate (lambda) for the l2 regularization
Number of steps to warm up the learning rate
Minimum and maximum dropout allowed. The study will explore discretely between the minimum and the maximum over 10 different values.
Parameters that apply only to the LinearModel search algorithm.
Used in:
The L2 ridge regression penalty used to fit the linear model. Must be > 0.
Ignore very high-loss architectures. Performs asymmetric outlier removal: Trials with loss in the top 10% or with loss >= 10*median are removed. Huge loss == SGD diverged; remove. Low loss == good model; keep!
With less than this many trials, just generate a random suggestion.
Controls how to align variable-depth architectures such that all architectures can be encoded as a fixed-length feature vector input.
Used in:
Networks will be aligned at the last layer before outputs.
Networks will be aligned at the first layer after inputs.
Next id: 53
Below this depth, Phoenix will fill with random blocks. This depth is also used for the non-adaptive search. I.e., all networks have fixed depth that is equal to minimum_depth.
If you exceed this depth, we go to evolutionary mode. I.e., we mutate one block at a time.
The type of search you would like.
Submessage for SearchTypes that accept further configuration. Must be compatible with the value of search_type.
The number of blocks between reduction cells.
The reduction block to use after each cell.
Whether to only search for a cell which is then replicated n times until the size of the architecture is equal to maximum_depth. If this option is false, each cell will be search independently.
To make it statistically significant, we set thresholds below which the algorithm cannot increase complexity (i.e., go deeper). Example: [10, 30, 50] -- The algorithm must complete 10 trials at depth 1 to go to depth 2, and has to finish 30 trails to try depth 3 etc. If not set, the default is [5 * i for i in range(maximum_depth)]
The probability that the architecture will grow deeper if there is enough capacity. With probability 1 - increase_complexity_probability, evolutionary mode will be run instead. Setting increase_complexity_probability < 1.0 increases exploration (this is in effect an epsilon-greedy exploration strategy) and could enable the search algorithm to get out of local minima.
The following three probabilities are only used in ADAPTIVE_COORDINATE_DESCENT_GENETIC. The probability to mutate a block.
The probability of crossover. A new architecture is the result of mixing two of the best candidates.
The probability of generating a random architecture.
The number of best performing completed trials to to consider when increasing complexity (i.e. going deeper). E.g. if beam_size = 5, then the base for the new deeper tower will be randomly selected from the top 5 completed trials. Must be >= 1.
TODO(b/172564129): Find a way to figure this out from the head/dataset.
List of blocks to use in the search. Please use names defined in: model_search/blocks.py
User suggestions for architectures. These suggestions will be trained first Once trained, Phoenix will switch to its specified search algorithm. Restrictions: 1. If you intend to run Phoenix solely to train the user suggested architectures, then there are no limitations. 2. If you are using a search algorithm after the user_suggestions (i.e., the number of max trials is greated than the length of this repeated field), then your suggestions are going to be data points for the search algorithm. Therefore, they must be compatible with the search; Namely, if you are using HARMONICA_SEARCH or NONADAPTIVE_RANDOM_SEARCH, then your suggestions must be of fixed depth since these algorithms work in a . fixed depth space. The depth should be equal to PhoenixSpec.minimum_depth. 3. If you are using ADAPTIVE_COORDINATE_DESCENT, this algorithm search over dynamic depth architectures. No restriction applies.
Fixes the architecture to a given one so that when replay is on, no search is performed. The phoenix returned estimator will always give the same architecture. When replay is on, there is no point to use parameterized_train_and_eval as replay also contains the hparams used to train every tower.
Whether to scale how long a trial trains for based on the architecture size. If enabled, each extra block of an architecture adds max_train_steps / max_depth number of steps to the training. This only affects training when using mutation search.
If provided, it means that the input function provided for Phoenix already created the lengths tensor for the recurrent problem. Phoenix, will use the provided lengths tensor and will assume, that we have only one sequential feature.
The name of the weight tensor in the features dict. Keep empty for no weights.
Apply dropout
Spec for learning, e.g., apply_exponential_decay, etc.
Specifies how to create an ensemble inside Phoenix. Ensembling is currently not supported for TPU (b/122906465).
Specifies how to distill the ensemble that Phoenix creates. If both a DistillationSpec and an EnsemblingSpec are provided, the DistillationSpec will be ignored. TODO(b/172564129): Integrate distillation with ensembling.
Use synchronous optimization across workers.
Task spec for the various tasks. Please keep empty if you have one task.
The most important task when using multi_task_spec. The loss / prediction of estimator are going to be set to this label.
If set to true, the losses from the multitask above are added to one, and then optimized. Otherwise, each loss is trained separately (i.e., the training is alternating between the various losses optimization).
NHWC or NCHW - only applies to cnn problems
TODO(b/172564129): Add the ability to specify a custom tower to use with the auxiliary head. Whether to use an auxiliary head for CNN tasks. If True, two kinds of auxiliary heads can be attached: 1. If the network contains a downsample or reduction block, the NASNet auxiliary head will be attached right before the block. 2. If the network contains a flatten block which is not connected to the logits, an auxiliary head will be attached to this flatten block. If neither of these two criteria hold, no auxiliary head will be attached.
The weight to use for the auxiliary head loss. Must be in [0, 1.0]. The weight for the loss of the main head will be set to 1 - auxiliary_head_loss_weight.
If true, in case there is embedding feature column, we use the embedding once for all towers in Phoenix. Otherwise, we create different embedding for every tower. If in doubt, keep false.
Temperature added to predict graph when exporting.
This option gives the user the ability to disallow phoenix from adding A last dense layer to create the logits that match the problem size. This option is to be used if the user can guarantee that the last block produces logits (i.e. output tensor) with the approprite size. If unsure, please keep as false.
Only apply to multitask If true, we don't infer the label tensor and label weight when passing them to the loss_fn and metric_fn. We pass the estimator label dict as is. Note that this requires the user to modify the default loss_fn/metric_fn to accept a dictionary and calculate the loss based on the tensor in that dictionary.
How does the input look like? Use DNN if you have many feature columns. Use CNN for plate tensors like images or speech (timeframes X freq. bands) Use RNN_ALL_ACTIVATIONS for language model like classification. I.e., Assume a label for every input in the sequence. For one classification rnn (i.e., one label per sequence), use RNN_LAST_ACTIVATIONS
Used in:
Used in:
Each trial will try to add, or modify one block from the best trial so far.
Each trial will randomly choose blocks to construct the network.
The harmonica search: https://arxiv.org/pdf/1706.00764.pdf
An extension of the coordinate descent algorithm which does not search over reduction blocks. Instead a reduction block is placed every num_blocks_in_cell. For example, if num_blocks_in_cell=4 then a reduction block will be added to the architecture after every 4 blocks. NOTE: Do not include reduction blocks in blocks_to_use otherwise we will crash (e.g. pooling, downsample, reduction, flatten, etc.). NOTE: The following fields should be set if using this algorithm: num_blocks_in_cell, reduction_block_type, replicate_cell.
Fit a linear model predicting accuracy from block binary feature vector. Then take the argmax of that model, and possibly increase depth.
This message helps Phoenix to search for an architecture when we are optimizing for multiple tasks. The search is trying to find a good architecture for the shared model. We will call the logits of the shared model (shared across the various tasks) `search logits`. You can specify how to create `final logits` for the tasks given the search logits. I.e., you can specify what blocks to add above the shared model to the specific task - This addition is currently fixed and not searchable. Next id: 8
Used in:
required This name must match the name in the label dict in model_fn of estimator.
The number of classication classes - use 2 for binary classification.
Additional blocks between the search logits and the final logits. Keep empty if you don't wish to add blocks between the final logits for the task and the search logits. A fully connected layer is added above the search logits if the dimension is not equal to the number of classes of the task.
Feature name for the weight for the label.
True is weight feature is in feature dict, otherwise (if in label dict), False.
Adding the ability to specify whether or not to apply activation before adding the appropriate architecture for the task specified in field architecture above.
Next id: 3
Used in:
,List of blocks to use in the architecture (stacked). Please use names (strings) defined in: model_search/blocks.py
Ability to override additional hparams in the trial. Keep empty to rely on tuner hparams instead.