Get desktop application:
View/edit binary Protocol Buffers messages
Activation function specs.
Used in:
Name of activation function.
Used in:
Active Sampling specifications message.
Used in:
The field name suffix '_micros' shows that the value contained in the field is converted at runtime to a floating point number by dividing it by 1e6. The reason for _micros fields is so that we can realiably encode and compare protos without having to worry about floating point rounding and comparisons.
Used in:
The initial learning rate. Must be >= 0. A recommended starting value is 2000 (i.e. real value 0.002).
The ratio by which the learning rate decays per epoch of training. Must be >= 0. A recommended starting value is 5000 (i.e. real value 0.05).
Must be in real value range 0 < beta_1 < 1. A recommended starting value is 900000 (i.e. real value 0.9).
Must be in real value range 0 < beta_2 < 1. A recommended starting value is 999000 (i.e. real value 0.999).
The normalized gradient clip value. A recommended starting value is 5000000 (ie. real value 5.0).
21
Used in:
4 Analyze the size of target bechmraks in number of instructions and token length.
Used in:
10 Map on 2-Dimensional space the number of memory vs computational instructions from Grewe's features.
Used in:
The specification of a training corpus.
Used in: , ,
The input contentfiles to the corpus. Shell variables are expanded. E.g. $HOME -> /home/<user>.
The ID of an already-cached corpus.
The path to a directory on the local system containing content files.
The path to a tarball archive file containing content files.
The path to a bigQuery database containing files.
The path to a database of encoded files.
A list of preprocessor passes to run on each contentfile in the corpus prior to training, in the order in which they are run.
Used in:
Encoding and data masking configuration for sample corpus. sampling type can be 'normal', 'online', 'active' normal: corpus is pre-masked, then fed for predictions. online: A datapoint is requested from original corpus and masked on the fly. active: Same as online. Active learning is applied between sample and target features
The schema for a corpus metafile. An instance of this proto is created with name META.pbtxt in the cache directory of each corpus.
Used in:
Represent single training instance as whole padded kernel, or arbitrary statement sequences. Valid options are "kernel" or "statement".
When datapoints should be pre-processed for training/validation/sampling. 'pre': Raw corpus is masked and stored. Then used. 'online': Raw corpus is stored. During training/validation/sampling, datapoints are pre-processed on the fly and provided to the model.
Use [START] and [END] meta tokens at the beginning and end of each sequence.
If datapoints are 'kernel', kernels > seq len are discarded. If true, they are kept and truncated instead.
Number of steps that constitute an epoch. Checkpoints and samples are taken once every epoch.
Select a value between 0 and 100. This percentage will be used to split dataset into training and validation. Validation set will not be seen during training.
single token masks are BERT's default. Alternatively, use a hole token to represent an arbitrary amount of hidden tokens.
Define a group of Databases that will be considered jointly.
Used in: , , , , , , , , , , , , , , , ,
Length of datapoint threshold.
Dropout Specs.
Used in:
Input embeddings are usually very useful.
Used in:
Specification of app's evaluation pipeline.
Define workspace path
Define tokenizer path
Define a list of different evaluators to run.
ExpectedErrorReduction is the predictive head and a few parameters.
Used in:
20
Used in:
19
Used in:
Used in:
Max value that can be tokenized. Anything above that is UNK.
Threshold of values being singularly tokenized (not represented as range).
Range width of tokenized values after singularity threshold
Sequence length of tokenized feature vectors.
8 Calculate and plot the average score of the top-K best samples per target from each DB group.
Used in:
22
Used in:
15 Generate CLSmith samples.
Used in:
17 Convert databases per db group to CSV for Grewe's predictive model.
Used in:
Define a message representing a CSV for Grewe's predictive model.
Used in: , ,
16 Calculate top samples to target per db group and store to csv for Grewe's predictive model.
Used in:
Used in: , ,
In case sequences are hole-d, choose upper bound range of possible hole length (will be [0, hole_length]).
Select distribution from which each hole length will be sampled.
Learning holes is a difficult task. Stage training to start from many single-token holes (equivalent to masks) and slightly move to fewer and increasingly lengthier holes.
9 Per target benchmark, print the best candidates among each database group.
Used in:
23
Used in:
A BenchPress instance encapsulates all of the settings for training and sampling a language model.
Used in:
The path to the benchpress working directory. This directory stores persistent caches files, including the corpus, model checkpoints, and sampler outputs. If not provided, the default value of $HOME/.cache/benchpress is used.
Optionally, a github miner to scrape files with a requested specification.
The language model specification. Either the full description of a model, or the path to a pretrained_model, as created by --export_tf_model option of benchpress.
The sampler specification.
A collection of instances.
The path of the source file.
The string source code.
A string description of the status. Only set if status != OK.
Used in:
2 Calculate and plot the average score of the top-K best samples per target from each DB group.
Used in:
KMeans type architecture. See scitkit-learn specs for parameters below.
Used in:
kNN architeture.
Used in:
6 Plot LLVM-IR Instruction count distribution of given database groups.
Used in:
Layer wrapper to help with layer's ordering consistency.
Used in:
LayerNorm specs.
Used in:
The candidate vocabulary. This is the list of multicharacter tokens, including all those already in the vocabulary map below.
The derived vocabulary and its numerical mapping.
Used in:
The string to tokenize.
The tokenized string.
Linear specs.
Used in:
1 Dump log file for all databases provided.
Used in:
MLP-type architecture.
Used in: ,
Used in: , ,
When selecting an index in the input tensor, the original BERT model gives 80% chance to replace it with a MASK, a 10% chance to replace it with another random token and another 10% to leave it be after all. Set True to enable this behavior. Otherwise, when selecting an index in the input, this will be replaced by a MASK.
Used in: ,
Special category of a hole represented as a set of masks.
Select distribution from which each hole length will be sampled.
Learning holes is a difficult task. Stage training to start from many single-token holes (equivalent to masks) and slightly move to fewer and increasingly lengthier holes.
Used in:
The maximum length of a sample, as a number of tokens. The length of the Sampler.start_text counts towards this total.
3 Calculate and plot the minimum score from each DB group for every target benchmark.
Used in:
The specification of a benchpress model.
Used in: , ,
Records telemetry data about a single epoch of model training.
The number of milliseconds since the epoch that epoch training completed.
The epoch which has just finished training, starting at one.
The wall time that it took to train the epoch.
The model's loss.
The schema for a model metafile. An instance of this proto is created with name META.pbtxt in the cache directory of each model.
12 Calculate and plot the average score of the top-K best samples per target from each DB group.
Used in:
Dummy selector, not used.
The specification of a benchpress language model.
Used in:
The size of the input embedding layer. Only required if backend == KERAS_SEQ. Must be > 0.
The type of neuron. Valid options are: {"lstm","rnn","gru"}.
The number of neurons in each layer of the network.
The total number of layers in the network.
If greater than zero, this adds a dropout layer after each layer of neurons with probability post_alyer_drop_micros / 1000000. E.g. a value of 2000 would insert a dropout with probability of 0.2.
Size of the encoder layers and the pooler layer.
The messages below correspong to BERT parameters. Number of hidden layers in the Transformer encoder.
Number of attention heads for each attention layer in the Transformer encoder.
The size of the "intermediate" (i.e., feed-forward) layer in the Transformer encoder.
The non-linear activation function (function or string) in the encoder and pooler.
The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
The dropout ratio for the attention probabilities.
The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
The vocabulary size of the `token_type_ids` passed into `BertModel`.
The stdev of the truncated_normal_initializer for initializing all weight matrices.
The epsilon used by the layer normalization layers.
Usage flag for feature vector encoding during training.
Concatenated raw features vector input length.
Embedding dimension of feature transformer input.
Dropout probs of feature transformer encoder.
Threshold up to which every numerical value maps to a single token.
Maximum value allowed for feature numerical encoding.
Length of value range mapping to single token for feature tokenizer.
Num of attention heads for feature transformer
Size of feature encoder's internal feedforward.
Layer norm epsilon for feature encoder.
Num of hidden layers for feature encoder.
Used in:
Used in:
Used in: ,
7 Map the features of database groups on a PCA-2 reduced space.
Used in:
Plot configurations
Used in: , , , , , , , , , , , , , , , , , ,
The specification of a pre-training corpus.
Used in:
The input contentfiles to the corpus. Shell variables are expanded. E.g. $HOME -> /home/<user>.
The ID of an already-cached corpus.
The path to a directory on the local system containing content files.
The path to a tarball archive file containing content files.
The path to a bigQuery database containing files.
The path to a database of encoded files.
A list of preprocessor passes to run on each contentfile in the corpus prior to training, in the order in which they are run.
A preprocessor worker input.
The output of a preprocessor worker.
Used in:
Used in:
More verbose failure causes:
The elapsed time of each preprocessing job.
Committee is a list of models.
Used in:
repeated <ModelType> model_type = x;
Used in:
Used in: ,
The schema for a model metafile. An instance of this proto is created with name META.pbtxt in the cache directory of each model.
Used in:
The initial learning rate. Must be >= 0. A recommended starting value is 1000 (i.e. real value 0.001).
The ratio by which the learning rate decays per epoch of training. Must be >= 0. A recommended starting value is 0.
14 Calculate and plot the average score of the top-K best samples per target from each DB group.
Used in:
Dummy selector, not used.
13 Calculate and plot the average score of the top-K best samples per target from each DB group.
Used in:
Dummy selector, not used.
A generated sample. Instances of this proto are returned by a Model's Sample() method.
Sampling may be batches, so that the sum of sample_time_ms over a range of samples may be much higher than the actual amount of time required to sample the set. This field contains the number of milliseconds between the last sample completing and this sample completing, so that by summing wall_time_ms, it is possible to get an accurate idea of the actual time taken to produce a set of samples.
Specification of a new sample corpus to get feeds from.
Used in:
Criteria used for determining when to stop sampling.
Used in:
The specification of a benchpress sampler.
Used in: ,
The initial text to the seed the language model with. Each sample will begin with this text.
Simple string
Sample from training set
Sample from validation set
Create set with new specs from original corpus
Specify a whole new corpus to encode and sample with new specs.
Sample live by getting input() from user.
The sampling batch size. TODO(cec): Always sample with max batch size.
The length of sampling sequences.
The sampling temperature. Must be >= 0. A recommended starting value is 1000000 (i.e. 1.0 in real values).
The criteria that determine when to terminate a sample, in the order in which they will be executed. Duplicates are allowed, for example if you would like to have symmetrical token depth counters for two pairs of tokens.
A message describing the experiment of this sampler.
The schema for a sampler metafile. An instance of this proto is created with name META.pbtxt in the cache directory of each model that samples it.
Used in:
5 Plot token size relative distribution of given database groups.
Used in:
11 Collect CPU vs GPU labels for
Used in:
18
Used in:
Options used for training a benchpress language model.
Used in:
The number of epochs to train the network for.
The length of training sequences.
BERT only. Number of training steps.
BERT only. Number of pre-training steps.
BERT only. Number of warmup steps.
Maximum number of masked LM predictions per sequence.
Number of times to duplicate the input data (with different masks).
Masked LM probability.
Random seed for data generation.
If true, shuffle the order of contentfiles in the corpus between each training epoch.
The training batch size. Note that this is only a *requested* batch size, there may be cases where the runtime decides to modify this value. For example, when the corpus size is smaller than the batch size. Any changes to this value at runtime will be logged as errors.
In case of BERT model, a specific data generator is needed.
The optimizer configuration.
Used in: ,
Create new sets out of the original corpus, to use them for validation or sampling. Useful to test the model against different specs.
Instance of a single evaluation pass.
Used in:
Define all different evauators supported.
The tokenizer to use to encode the corpus.
Used in: ,