Get desktop application:
View/edit binary Protocol Buffers messages
Next Available Field: 4
Config for the BERT/SMITH based dual encoder.
This field must be set to supply the train/eval data.
Config for optimization, this field is required.
Configuration for BERT-based or SMITH-based encoder. Next Available Field: 18
Used in:
The name of the model.
Which pretrained checkpoint to use. This field is required for fine-tuning.
Which prediction checkpoint to use for model prediction process.
Where is the bert config file.
Where is the document level bert config file, which is only used in the the SMITH model.
Where is the vocab file.
This is only used for the BERT model. The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, and sequences shorter than this will be padded. Normally, this should be no larger than the one used in pretraining. This should be matched with the data generation settings.
Maximum number of masked LM predictions per sequence. Note that for the SMITH model, the maximum number of masked LM predictions per document is max_doc_length_by_sentence * max_predictions_per_seq.
This is only used for the SMITH model. The maximum number of tokens in a sentence.
This is only used for the SMITH model. The maximum number of sentences in a document.
This is only used for the SMITH model. The number of looped sentences in a document to control the used TPU memory. This number should be shorter than the setting of max_doc_length_by_sentence.
This is only used for the SMITH model. Whether update the parameters in the sentence level Tranformers of the SMITH model.
This is only used for the SMITH model. The maximum number of sentences to be masked in each document.
This is only used for the SMITH model. If true, add the masked sentence LM loss into the total training loss.
The number of different labels in classification task.
The type of document representation combing mode. It can be normal, sum_concat, mean_concat or attention.
The size of the attention vector in the attention layer for combining the sentence level representations to generate the document level representations.
Configuration for a loss function. Next Available Field: 2
Used in:
Hyperparameters for the loss function. The amplifier to increase the logits value, so that sigmoid(logits) is closer to 0 or 1. The default value is 6.0.
Definition of sections in WikiDoc pages. NextID: 3
Used in:
Proto to specify train/eval datasets and train/eval settings. Next Available Field: 13
Used in:
File patterns for train set, separated by commas if you have multiple files. This field is required.
File patterns for eval set, separated by commas if you have multiple files.
Total batch size for training.
Total batch size for evaluation.
Total batch size for prediction.
Maximum number of eval steps. This should be set according to the size of eval data. During model pre-training, we can also use a part of training data for evaluation.
How often to save the model checkpoint.
How many steps to make in each estimator call.
This is set to true if we awalys want to evaluate the model with the eval or test data even in the pre-train mode, so that we know whether the model overfits the training data.
The weight to compensate when we have more negative examples.
Definition of contents in a WikiDoc objects. NextID: 7
Used in:
An id that uniquely identifies this document. The id can be generated based on the url of the document.
The url of the WikiDoc page.
The title of the WikiDoc page.
The description of the WikiDoc page.
The section contents of the WikiDoc page.
A list of image ids of images in the WikiDoc page.
Definition of a pair of two WikiDoc objects. NextID: 10
An id that uniquely identifies this document pair. The id can be generated based on the urls of the document pair.
The classification label generated by machine. We set this as int in case we would like to change number of graded levels of this label.
The classification label generated by human.
The regression label generated by machine.
The regression label generated by human.
Two document objects with similarity labels.
The model predicted similarity score for this pair.
The raw human rating scores.