package yggdrasil_decision_forests.metric.proto

Get desktop application:
View/edit binary Protocol Buffers messages

Used in: MetricEstimate

optional double lower = 1
optional double upper = 2

Configuration of the evaluation of a model. Describes how the evaluation should be done.

Used in: model.generic_worker.proto.Request.EvaluateModel

optional model.proto.Task task = 1
Task of the model.
oneof task_options
Evaluation configuration depending on the type of problem.
- EvaluationOptions.Classification classification = 2
- EvaluationOptions.Regression regression = 3
- EvaluationOptions.Ranking ranking = 7
- EvaluationOptions.Uplift uplift = 8
- EvaluationOptions.AnomalyDetection anomaly_detection = 9
optional float prediction_sampling = 4
Fraction of sampled predictions. If no predictions need to be sampled (i.e. no part of the configuration needs it), this parameter is ignored and no prediction is sampled.
optional int64 bootstrapping_samples = 5
Number of bootstrapping samples used to evaluate metric confidence intervals and statistical test (i.e. all the metric ending with "[B]"). If <=0, bootstrapping estimation is disabled. Note: Bootstrapping is done on the sampled predictions (controlled by "prediction_sampling" parameter). Note: Bootstrapping is an expensive computation. Therefore, for quick experimentation with modeling, bootstrapping can be temporally reduced or disabled.
optional dataset.proto.WeightDefinition weights = 6
Weights of the examples. This field does not have to match the "weight_definition" in the model training. For example, the weighting can be enabled for evaluation and disabled for training. Such case is rare however.
optional bool force_slow_engine = 10
Force usage of the slow engine for predictions. This option is ignored by functions that are called with a fast engine
optional int32 num_threads = 11

Used in: EvaluationOptions

(message has no fields)

Used in: EvaluationOptions

optional bool roc_enable = 1
Do compute the ROC metrics (or other metrics using the same type of computation e.g. PR-AUC, P@R).
optional int64 max_roc_samples = 2
Maximum number of points in the ROC curve.
repeated double precision_at_recall = 3
List of recall values (between 0 and 1) for the evaluation of precision at given recall.
repeated double recall_at_precision = 4
List of precision values (between 0 and 1) for the evaluation of recall at given precision.
repeated double precision_at_volume = 5
List of volume values (between 0 and 1) for the evaluation of precision at given volume.
repeated double recall_at_false_positive_rate = 6
List of false positive rates for the evaluation of recall at given false positive rates.
repeated double false_positive_rate_at_recall = 7
List of false recall for the evaluation of false positive rate at given recall.
Next ID: 8

Used in: EvaluationOptions

optional int32 ndcg_truncation = 1
Number of evaluated elements.
optional int32 mrr_truncation = 2
Rank cut-off at which Mean Reciprocal Rank is computed.
optional bool allow_only_one_group = 3
If false (default) and if all the predictions (items) are in the same group (i.e. there is only one group), raises an error.

Used in: EvaluationOptions

optional bool enable_regression_plots = 1
Do compute the regression plots (histogram of ground truth, residual and predictions, normality test of residual, conditional plots).

Used in: EvaluationOptions

(message has no fields)

Evaluation results of a model. This proto is generated by the "EvaluateLearner" or "model->Evaluate()" functions. For manual evaluation, this proto is best generated using the "InitializeEvaluation", "AddPrediction" and "FinalizeEvaluation" functions in "metric.h". This proto can be converted into human readable text with "AppendTextReport" or into a html+plot with "SaveEvaluationInDirectory". The html version contains more information that the raw text. Individual metrics can be extracted using the utility methods defined in "metrics.h" e.g. "Accuracy()", "LogLoss()", "RMSE()".

Used in: model.generic_worker.proto.Result.EvaluateModel, model.generic_worker.proto.Result.TrainModel, model.random_forest.proto.OutOfBagTrainingEvaluations

optional double count_predictions = 1
Number of predictions (weighted by example weight).
optional int64 count_predictions_no_weight = 2
Number of predictions (without weights).
repeated model.proto.Prediction sampled_predictions = 3
Samples predictions. Only sampled if necessary (e.g. if ROC is computed).
optional double count_sampled_predictions = 4
Number of sampled predictions (weighted by example weight).
optional model.proto.Task task = 5
Task of the model.
oneof type
Evaluation results depending on the type of problem.
- EvaluationResults.Classification classification = 6
- EvaluationResults.Regression regression = 7
- EvaluationResults.Ranking ranking = 12
- EvaluationResults.Uplift uplift = 14
- EvaluationResults.AnomalyDetection anomaly_detection = 16
optional dataset.proto.Column label_column = 8
The dataspec of the label column. This field can contain information such as: The possible label values, the distribution of the label values, the string representation of the label value, etc.
optional float training_duration_in_seconds = 9
Training time of the model. In case of cross-validation evaluation results, "training_duration_in_seconds" is the average training time of a single model.
optional float loss_value = 10
Value of the loss function used to optimize the model. Not all machine learning algorithms are optimizing a loss function, and different loss functions can be compatible for a given task.
optional string loss_name = 11
optional int32 num_folds = 13
Number of folds used for the evaluation. The number of folds is 1 for train-and-test, and equals to the cross-validation number of folds in case of cross-validation.
map<string, double> user_metrics = 15
User can use this field to store value for any customized metrics.
Next ID: 17

Used in: EvaluationResults

(message has no fields)

Used in: EvaluationResults

optional utils.proto.IntegersConfusionMatrixDouble confusion = 1
Confusion between the label and the predictions. Note that confusion tables are stored column major (which, admittedly, is confusing).
repeated Roc rocs = 2
One-vs-other Receiver operating characteristic curve. Indexed by the categorical label value.
optional double sum_log_loss = 3
Sum of the log loss.
optional double accuracy = 5
Accuracy of the model. If both "accuracy" and "confusion" is specified, they represent the same value.
Next ID: 6

Used in: EvaluationResults

optional MetricEstimate ndcg = 5
optional int32 ndcg_truncation = 2
optional int64 num_groups = 3
optional int64 min_num_items_in_group = 10
optional int64 max_num_items_in_group = 11
optional double mean_num_items_in_group = 12
optional double default_ndcg = 4
optional MetricEstimate mrr = 8
optional int32 mrr_truncation = 9
optional MetricEstimate precision_at_1 = 13
Fraction of examples were the highest predicted example is also the example with the highest relevance value.

Used in: EvaluationResults

optional double sum_square_error = 1
Sum for the squared error. For regression only.
optional double sum_label = 2
Sum of the labels.
optional double sum_square_label = 3
Sum of the square of the labels.
optional double bootstrap_rmse_lower_bounds_95p = 4
Lower and upper bounds of the RMSE computed using non-parametric bootstrapping.
optional double bootstrap_rmse_upper_bounds_95p = 5
optional double sum_abs_error = 6
Sum of absolute value of the error.
Next ID: 7

Used in: EvaluationResults

optional double auuc = 1
Note: In the case of multi-treatments, the "auuc" and "qini" are the example weights average of the per-treatment AUUC and Qini. We use the implementation described in Guelman ("Optimal personalized treatment learning models with insurance applications") or in Betlei ("Treatment targeting by AUUC maximization with generalization guarantees") work.
optional double qini = 2
optional int32 num_treatments = 3
Number of possible treatments. The treatment values (i.e. the value of the categorical column specifying the treatment) are in [1, num_treatments+1) with value "1" reserved for the control treatment. For example, in case of single-treatment vs control, "num_treatments=2" and the treatment value will be 1 (control) or 2 (treatment).
optional double cate_calibration = 4
The Conditional Average Treatment Effect Calibration metrics (cate_calbration) computes the l2 expected calibration error of a binary treatment uplift model. Miscalibration is a phenomenon that magnitute of a treatment effect is overestimated due to overfitting CATE training data. Here we use the expected "l2 norm of difference between 1) predicted CATE, and 2) unbiased estimation of observed CATE" over all uplift values. The metrics value is greater than 0, with lower values being more desirable, i.e. "more calibrated". This metric is defined in equation (2.4) of paper "Calibration Error for Heterogeneous Treatment Effects", by Xu et al. (https://arxiv.org/pdf/2203.13364.pdf)

Reference a metric."MetricAccessor" is used as a parameter of the function "GetMetric" to extract metric values from evaluation results proto. Example: a = EvaluationResults { classification { accuracy:0.7 auc:0.8 ap:0.9 } } b = MetricAccessor { classification {}} GetMetric(a,b) -> 0.7

Used in: model.hyperparameters_optimizer_v2.proto.Evaluation

oneof Task
- MetricAccessor.Classification classification = 1
- MetricAccessor.Regression regression = 2
- MetricAccessor.Loss loss = 3
- MetricAccessor.Ranking ranking = 4
- MetricAccessor.Uplift uplift = 5
- MetricAccessor.AnomalyDetection anomaly_detection = 7
- MetricAccessor.UserMetric user_metric = 6

Used in: MetricAccessor

(message has no fields)

Used in: MetricAccessor

oneof Type
- Classification.Accuracy accuracy = 1
- Classification.LogLoss logloss = 2
- Classification.OneVsOther one_vs_other = 3

Used in: Classification

(message has no fields)

Used in: Classification

(message has no fields)

Used in: Classification

optional string positive_class = 1
oneof Type
- OneVsOther.Auc auc = 2
- OneVsOther.PrAuc pr_auc = 3
- OneVsOther.Ap ap = 4
- OneVsOther.PrecisionAtRecall precision_at_recall = 5
- OneVsOther.RecallAtPrecision recall_at_precision = 6
- OneVsOther.PrecisionAtVolume precision_at_volume = 7
- OneVsOther.RecallAtFalsePositiveRate recall_at_false_positive_rate = 8
- OneVsOther.FalsePositiveRateAtRecall false_positive_rate_at_recall = 9

Used in: OneVsOther

(message has no fields)

Used in: OneVsOther

(message has no fields)

Used in: OneVsOther

optional float recall = 1

Used in: OneVsOther

(message has no fields)

Used in: OneVsOther

optional float recall = 1

Used in: OneVsOther

optional float volume = 1

Used in: OneVsOther

optional float false_positive_rate = 1

Used in: OneVsOther

optional float precision = 1

Used in: MetricAccessor

(message has no fields)

Used in: MetricAccessor

oneof Type
- Ranking.NDCG ndcg = 1
- Ranking.MRR mrr = 2

Used in: Ranking

(message has no fields)

Used in: Ranking

(message has no fields)

Used in: MetricAccessor

oneof Type
- Regression.Rmse rmse = 1
- Regression.Mae mae = 2

Used in: Regression

(message has no fields)

Used in: Regression

(message has no fields)

Used in: MetricAccessor

oneof type
- Uplift.Qini qini = 1
- Uplift.CateCalibration cate_calibration = 2

Used in: Uplift

(message has no fields)

Used in: Uplift

(message has no fields)

Used in: MetricAccessor

optional string metrics_name = 1

Estimated measure of a metric.

Used in: EvaluationResults.Ranking

optional double value = 1
Expected value.
optional Bounds bootstrap_based_95p = 2
Upper and lower 95% bound estimated using bootstrapping.

A receiver operating characteristic curve.

Used in: EvaluationResults.Classification

repeated Roc.Point curve = 1
Points sorted with decreasing recall (i.e. increasing threshold).
optional double count_predictions = 2
Sum of the tp+fp+tn+fn of one element (this is the same for all elements). "sum" is equal to "count_predictions" if the ROC is computed without sampling (i.e. roc_prediction_sampling==1).
optional double auc = 3
Area under the curve.
optional double pr_auc = 4
Precision/Recall AUC.
optional double ap = 10
Average Precision.
repeated Roc.XAtYMetric precision_at_recall = 5
Metric X evaluated under constraint of a given metric Y value. These three fields have the same number of element as the fields of the same name in "EvaluationOptions::Classification".
repeated Roc.XAtYMetric recall_at_precision = 6
repeated Roc.XAtYMetric precision_at_volume = 7
repeated Roc.XAtYMetric recall_at_false_positive_rate = 8
repeated Roc.XAtYMetric false_positive_rate_at_recall = 9
optional Roc bootstrap_lower_bounds_95p = 11
Lower and upper bounds of all metrics computed using non-parametric percentile bootstrapping. Only available if bootstrapping is enabled i.e. num_bootstrapping_samples>=1.
optional Roc bootstrap_upper_bounds_95p = 12

Used in: Roc

optional float threshold = 1
optional double tp = 2
True/False Positive/Negative.
optional double fp = 3
optional double tn = 4
optional double fn = 5

Value of a metric X (e.g. recall) for a given other metric Y value (e.g. FPR).

Used in: Roc

optional double y_metric_constraint = 1
optional double x_metric_value = 2
optional float threshold = 3

package yggdrasil_decision_forests.metric.proto

message Bounds

optional double lower = 1

optional double upper = 2

message EvaluationOptions

optional model.proto.Task task = 1

oneof task_options

EvaluationOptions.Classification classification = 2

EvaluationOptions.Regression regression = 3

EvaluationOptions.Ranking ranking = 7

EvaluationOptions.Uplift uplift = 8

EvaluationOptions.AnomalyDetection anomaly_detection = 9

optional float prediction_sampling = 4

optional int64 bootstrapping_samples = 5

optional dataset.proto.WeightDefinition weights = 6

optional bool force_slow_engine = 10

optional int32 num_threads = 11

message EvaluationOptions.AnomalyDetection

message EvaluationOptions.Classification

optional bool roc_enable = 1

optional int64 max_roc_samples = 2

repeated double precision_at_recall = 3

repeated double recall_at_precision = 4

repeated double precision_at_volume = 5

repeated double recall_at_false_positive_rate = 6

repeated double false_positive_rate_at_recall = 7

message EvaluationOptions.Ranking

optional int32 ndcg_truncation = 1

optional int32 mrr_truncation = 2

optional bool allow_only_one_group = 3

message EvaluationOptions.Regression

optional bool enable_regression_plots = 1

message EvaluationOptions.Uplift

message EvaluationResults

optional double count_predictions = 1

optional int64 count_predictions_no_weight = 2

repeated model.proto.Prediction sampled_predictions = 3

optional double count_sampled_predictions = 4

optional model.proto.Task task = 5

oneof type

EvaluationResults.Classification classification = 6

EvaluationResults.Regression regression = 7

EvaluationResults.Ranking ranking = 12

EvaluationResults.Uplift uplift = 14

EvaluationResults.AnomalyDetection anomaly_detection = 16

optional dataset.proto.Column label_column = 8

optional float training_duration_in_seconds = 9

optional float loss_value = 10

optional string loss_name = 11

optional int32 num_folds = 13

map<string, double> user_metrics = 15

message EvaluationResults.AnomalyDetection

message EvaluationResults.Classification

optional utils.proto.IntegersConfusionMatrixDouble confusion = 1

repeated Roc rocs = 2

optional double sum_log_loss = 3

optional double accuracy = 5

message EvaluationResults.Ranking

optional MetricEstimate ndcg = 5

optional int32 ndcg_truncation = 2

optional int64 num_groups = 3

optional int64 min_num_items_in_group = 10

optional int64 max_num_items_in_group = 11

optional double mean_num_items_in_group = 12

optional double default_ndcg = 4

optional MetricEstimate mrr = 8

optional int32 mrr_truncation = 9

optional MetricEstimate precision_at_1 = 13

message EvaluationResults.Regression

optional double sum_square_error = 1

optional double sum_label = 2

optional double sum_square_label = 3

optional double bootstrap_rmse_lower_bounds_95p = 4

optional double bootstrap_rmse_upper_bounds_95p = 5

optional double sum_abs_error = 6

message EvaluationResults.Uplift

optional double auuc = 1

optional double qini = 2

optional int32 num_treatments = 3

optional double cate_calibration = 4