Get desktop application:
View/edit binary Protocol Buffers messages
Configuration for the generation of training and testing folds.
Seed used to control the fold generation. A same seed is guarantied to generate same folds for a given dataset.
Split the dataset into n folds (n=10 by default). Then, for each subset of n-1 folds, train a model and evaluate it on the remaining folds. This methods is called "cross-validation". Cross-validation is more expensive (n models need to be trained) than "train and test" but the results are more precise.
Used in:
If num_folds=0, a leave-one-out cross-validation is executed (i.e. "num_folds" is set to the number of examples in the dataset).
"fold_group_attribute" an attribute name that defines the "group" appurtenance of each example. If "fold_group_attribute" is specified, all the examples of the same group will appear in the same fold. In other words, each group will either be entirely used for training or for testing.
Next ID: 3
Specify the "fold group" of each example. If specified, all the examples of the same group will appear in the same fold. In other words, each group will either be entirely used for training or for testing. There should be at least as much groups as there are folds.
Used in:
Name of a categorical attribute that defines the "group" membership of each example.
Next ID: 2
Does not train the model and evaluate on the entire dataset. This solution only make sense when the candidate method are all defined as pre-computed predictions or pre-computed models.
Used in:
(message has no fields)
Cross-validation of folds computed externally.
Used in:
Path ([type]:[path]) to a dataset containing numerical column called "fold_idx". "fold_idx[i]" in an integer in [0, num_folds) defining the folds of the i-th example.
Evaluate the candidate model on a separate dataset. The entire dataset specified in "TrainEvaluateCompareOptions" will be used for training. The entire dataset specified in "TestOnOtherDataset" will be used for testing.
Used in:
Path ([type]:[path]) to the test dataset.
Split the dataset in two folds. The first part will be used for training. The second part will be used for evaluation. This method is commonly called "Train and test" evaluation. This method is fast (only one model is trained) but the results are noisy (both with training noise and testing noise).
Used in:
Ratio of the dataset used for testing. The remaining examples are used for training.
Next ID: 2
Represents the (discrete) probability distribution of a random variable with natural (i.e. integer greater of equal to zero) support: counts[i]/sum is the probability of observation of i.
Used in:
,[required]
[required]
Used in:
,[required]
[required]
[required]
[required]
Confusion matrix between two integer distributions.
Used in:
,Contains nrow x ncol elements. Low column indexed i.e. the second element is counts[1,0]. [required]
[required]
[required]
[required]
Describe a 1d normal distribution.
Used in:
, ,[required]
[required]
[required]
Message for the metrics required to compute a partial dependence plot for multiple features or sets of features. This message is also used to store Conditional Expectancy Plots.
Used in:
Message for metrics required to compute a partial dependence plot for ONE feature or ONE set of features.
Used in:
The type of plot this represents.
Used in:
If this PartialDependencePlot represents a set of 3 features, attribute_info.num_bins represents the number of bins in each of these feature spaces. Further, \product_i attribute_info[i].num_bins_per_input_feature = pdp_bins.size().
Distribution of the attribute for each of the bins.
Boundaries of the bins for numerical attributes.
How to scale the axis when plotting this attribute. Only used for numerical attributes.
Used in:
Represents the metrics for a feature OR set of features at a particular value (Represented by attribute_values).
Used in:
The values used to represent the center of this bin. In case of a categorical feature, this stores the exact categorical value. In case of a numerical feature, this stores a value: (max_value - min_value)*(bin_number + 0.5)/num_bins + min_value.
Represent the accumulation of evaluation metrics.
Used in:
For regression.
For classification.
Represents the "sum" of a set of labels, either predicted by the model, or the ground truth.
Used in:
sum_of_regression_predictions should be normalized with num_observations to obtain the mean prediction.
sum_of_ranking_predictions should be normalized with num_observations to obtain the mean prediction.
sum_of_anomaly_detection_predictions should be normalized with num_observations to obtain the mean prediction.
Used in:
Header attached to an exported sharded multi-bitmap.
These fields are the same as the fields defined in "ShardedMultiBitmap".