package tensorflow_model_analysis

Get desktop application:
View/edit binary Protocol Buffers messages

Options for aggregating multi-class / multi-label outputs. When used the associated MetricSpec metrics must be binary classification metrics (NOT multi-class classification metrics).

Used in: MetricsSpec

oneof type
- bool micro_average = 1
  Compute aggregate metrics by treating all examples being equal (i.e. flatten the prediction/label pairs across all classes and perform the computation as if they were separate examples in a binary classification problem). Micro is typically used with multi-class outputs.
- bool macro_average = 2
  Computes aggregate metrics by treating all classes being equal (i.e. compute binary classification metrics separately for each of the classes and then take the average). This approach is good for the case where each class is equally important and/or class labels distribution is balanced. Macro is typically used with multi-label outputs. If macro averaging is enabled without using top_k_list, class_weights must be configured in order to identify which classes the average will be computed for.
- bool weighted_macro_average = 3
  Compute aggregate metrics using macro averaging but weight the classes during aggregation by the ratio of positive labels for each class. If weighted macro averaging is enabled without using top_k_list, class_weights must be configured in order to identify which classes the average will be computed for.
map<int32, float> class_weights = 4
Weights to apply to classes during aggregation (only supported if top_k_list is not used). Each key corresponds to a class ID. For micro aggregation the weights will be applied to each prediction/label pair. For macro aggregation the weights will be applied to the overall metric computed for each class prior to aggregation. If class_weights are configured, but some keys are not provided then their weights will be assumed to be 0.0. This allows the class_weights to be used to filter the classes used for aggregation. Note that for macro_average and weighted_macro_average when the top_k_list is not used, the class_weights are required. Also note that when used with weighted_macro_average, weights will be applied in two forms (from the ratio of positive labels and from the values provided here) which may or may not be desired (i.e. setting all the weights to 1.0 is the most common configuration for weighted_macro_average).
optional RepeatedInt32Value top_k_list = 5
Performs aggregation based on the classes with the top k predicted values for each value of top k provided. If not set then all classes are used. Note that unlike the top k used with binarization this truncates the list of classes to only the top k values (i.e. it does not set non-top k to -inf).

LINT.IfChange Aggregation types used with AggregationOptions.

Used in: MetricKey

oneof type
- bool micro_average = 1
  See AggregationOptions.micro_average.
- bool macro_average = 2
  See AggregationOptions.macro_average.
- bool weighted_macro_average = 3
  See AggregationOptions.weighted_macro_average.

For metrics which return an array of values.

Used in: MetricValue

ArrayValue.DataType data_type = 1
repeated int32 shape = 2
repeated bytes bytes_values = 3
Exactly one of these fields, corresponding to the data type, should be set.
repeated int32 int32_values = 4
repeated int64 int64_values = 5
repeated float float32_values = 6
repeated double float64_values = 7

Used in: ArrayValue

UNKNOWN = 0
BYTES = 1
INT32 = 2
INT64 = 3
FLOAT32 = 4
FLOAT64 = 5

oneof slicing_spec_oneof
- SliceKey slice_key = 1
  The slice key for the metrics.
- CrossSliceKey cross_slice_key = 3
  The cross slice key for the metrics.
repeated AttributionsForSlice.AttributionsKeyAndValues attributions_keys_and_values = 2
Attribution keys and values.

Used in: AttributionsForSlice

optional AttributionsKey key = 1
map<string, MetricValue> values = 2
Attribution values keyed by feature key (e.g. 'age', etc).

LINT.IfChange Attribution keys uniquely identify aggregated attribution values.

Used in: AttributionsForSlice.AttributionsKeyAndValues

string name = 1
Attribution metric name (e.g. 'mean', 'total', etc)
string model_name = 2
Optional model name (if multi-model evaluation).
string output_name = 3
Optional output name (for multi-output models).
optional SubKey sub_key = 4
Optional sub key associated with attribution (class_id, etc).
optional google.protobuf.BoolValue example_weighted = 6
If true, the metric is weighted by examples. If false, then the metric is not weighted by examples. If unset then it is unknown as to whether the metric was weighted by examples or not (i.e. the metrics were defined inside of a model). See MetricsSpecs.example_weighted for more information.
bool is_diff = 5
If true, this is a diff of attributions based on comparison with baseline.

Options for binarizing multi-class / multi-label outputs. When used the associated MetricSpec metrics must be binary classification metrics (NOT multi-class classification metrics).

Used in: MetricsSpec

optional RepeatedInt32Value class_ids = 4
Creates binary classification metrics based on one-vs-rest for each value of class_id provided.
optional RepeatedInt32Value k_list = 5
Creates binary classification metrics based on the kth predicted value for each value of k provided.
optional RepeatedInt32Value top_k_list = 6
Creates binary classification metrics based on the top k predicted values for each value of top k provided. How this is computed is up to each metric implementation. However, the default implementation is such that for a given top k setting, the input prediction arrays will be updated to set the non-top k predictions to -inf before flattening the resulting array into a single binarized value. This makes top k well suited to calculations such as precision@k or recall@k, but may not be well suited for other binary classification metrics unless special handing is provided. Note that precision@k and recall@k can also be configured directly as multi-class classification metrics by setting top_k on the metric itself.

Represents a real value which could be a pointwise estimate, optionally with approximate bounds of some sort. For instance, for AUC, these bounds could be the upper and lower Riemann sum of the integral.

Used in: ConfusionMatrixAtThresholds.ConfusionMatrixAtThreshold, MetricValue, ValueAtCutoffs.ValueCutoffPair

optional google.protobuf.DoubleValue lower_bound = 1
The lower bound of the range.
optional google.protobuf.DoubleValue upper_bound = 2
The upper bound of the range.
optional google.protobuf.DoubleValue value = 3
Represents an exact value if the lower_bound and upper_bound are unset, else it's an approximate value. For the approximate value, it should be within the range [lower_bound, uppper_bound].
BoundedValue.Methodology methodology = 4
Optionally describe the methodology that was used to calculate the bounds.

Used in: BoundedValue

UNKNOWN = 0
RIEMANN_SUM = 1
Used to calculate AUC, the upper and lower Riemann sum for an integral.
POISSON_BOOTSTRAP = 2
Used to calculate confidence intervals using Poisson bootstrapping. For more details, please see: http://www.unofficialgoogledatascience.com/2015/08/an-introduction-to-poisson-bootstrap26.html

Used in: PlotData

repeated CalibrationHistogramBuckets.Bucket buckets = 1

Used in: CalibrationHistogramBuckets

double lower_threshold_inclusive = 1
double upper_threshold_exclusive = 2
optional google.protobuf.DoubleValue num_weighted_examples = 3
optional google.protobuf.DoubleValue total_weighted_label = 4
optional google.protobuf.DoubleValue total_weighted_refined_prediction = 5

Used in: MetricsForSlice.MetricKeyAndValue

optional MetricValue upper_bound = 1
Each MetricValue field within this message will be populated with the same value type as in MetricKeyAndValue.value. This has the effect of creating a set of parallel data structures which provide elementwise confidence intervals. For example, if the MetricKeyAndValue.value contains an ArrayValue, then each of these fields will also contain an ArrayValue in which the array element at a given index will represent the lower bound, upper bound, and standard error for the MetricKeyAndValue.value element at that same index.
optional MetricValue lower_bound = 2
optional MetricValue standard_error = 3
optional google.protobuf.Int64Value degrees_of_freedom = 4

Used in: Options

ConfidenceIntervalOptions.ConfidenceIntervalMethod method = 1
The confidence interval method to use for all metrics.

Used in: ConfidenceIntervalOptions

UNKNOWN_CONFIDENCE_INTERVAL_METHOD = 0
POISSON_BOOTSTRAP = 1
JACKKNIFE = 2

Confusion matrix at thresholds. Next tag: 24

Used in: MetricValue, PlotData

repeated ConfusionMatrixAtThresholds.ConfusionMatrixAtThreshold matrices = 1
Matrices has different types of value representations: bounded, t-distribution and double. 1. Bounded values will be provided if the metrices are calculated using bootstrapping (Note: Confidence level is set to 95%). 2. T distribution values will be provided if metrices are calculated using bootstrapping and confidence level isn't set. Hence user will config the confidece levels through the frontend to get the final confidence intervals. We will support both TDistributionValue and BoundedValue now. But BoundedValue will be eventually deprecated. 3. Double values is being deprecated.

Used in: ConfusionMatrixAtThresholds

double threshold = 1
double false_negatives = 2
double true_negatives = 3
double false_positives = 4
double true_positives = 5
double precision = 6
double recall = 7
double false_positive_rate = 20
double f1 = 21
double accuracy = 22
double false_omission_rate = 23
optional BoundedValue bounded_false_negatives = 8
optional BoundedValue bounded_true_negatives = 9
optional BoundedValue bounded_false_positives = 10
optional BoundedValue bounded_true_positives = 11
optional BoundedValue bounded_precision = 12
optional BoundedValue bounded_recall = 13
optional TDistributionValue t_distribution_false_negatives = 14
optional TDistributionValue t_distribution_true_negatives = 15
optional TDistributionValue t_distribution_false_positives = 16
optional TDistributionValue t_distribution_true_positives = 17
optional TDistributionValue t_distribution_precision = 18
optional TDistributionValue t_distribution_recall = 19

CrossSliceKey contains two slices which are compared with each other.

Used in: AttributionsForSlice, MetricsForSlice, MetricsValidationForSlice, PlotsForSlice

optional SliceKey baseline_slice_key = 1
optional SliceKey comparison_slice_key = 2

Cross slice metric threshold.

Used in: CrossSliceMetricThresholds, MetricConfig

repeated CrossSlicingSpec cross_slicing_specs = 1
A list of cross slicing specs to apply threshold to.
optional MetricThreshold threshold = 2

Used in: MetricsSpec

repeated CrossSliceMetricThreshold thresholds = 1

Cross slicing specification.

Used in: CrossSliceMetricThreshold, EvalConfig, SlicingDetails, ValidationResult

optional SlicingSpec baseline_spec = 1
repeated SlicingSpec slicing_specs = 2

Tensorflow model analaysis config settings.

Used in: EvalConfigAndVersion, EvalRun

repeated ModelSpec model_specs = 2
Model specifications for models used. Only one baseline is permitted.
repeated SlicingSpec slicing_specs = 4
A list specs where each spec represents a way to slice the data. An empty config means slice on overall data. Example usages: - slicing_specs: {} Slice consisting of overall data. - slicing_specs: { feature_keys: ["country"] } Slices for all values in feature "country". For example, we might get slices "country:us", "country:jp", etc. - slicing_specs: { feature_values: [{key: "country", value: "us"}] } Slice consisting of "country:us". - slicing_specs: { feature_keys: ["country", "city"] } Slices for all values in feature "country" crossed with all values in feature "city" (note this may be expensive). - slicing_specs: { feature_keys: ["country"] feature_values: [{key: "age", value: "20"}] } Slices for all values in feature "country" crossed with value "age:20".
repeated CrossSlicingSpec cross_slicing_specs = 8
A list of cross slicing specs where each spec represents a pair of slices whose associated outputs should be compared. By default slices will be created for both slicing_spec and baseline_spec if they do not already exist in slicing_specs.
repeated MetricsSpec metrics_specs = 5
Metrics specifications.
optional Options options = 6
Additional configuration options.

Config and version.

optional EvalConfig eval_config = 1
string version = 2

Evaluation run containing config, version and input parameters. This should be structurally compatible with EvalConfigAndVersion such that a saved EvalRun can be read as an EvalConfigAndVersion.

optional EvalConfig eval_config = 1
string version = 2
string data_location = 3
Location of data used with evaluation run.
string file_format = 4
File format used with evaluation run.
map<string, string> model_locations = 5
Locations of model used with evaluation run.

Options for use of example weights in metric computations. These settings are only useful if an example weight key is being used.

Used in: MetricsSpec

bool weighted = 1
Set to true to enable weighted metrics. Setting weighted to false has no effect. If weighted is true but an example weight key was not provided, then a weight of 1.0 will be assumed (which is effectively the same as unweighted, but the metric keys will have weighted set to true).
bool unweighted = 2
Set to true to enable unweighted metrics. Setting unweighted to false has no effect.

Generic change threshold message.

Used in: MetricThreshold

optional google.protobuf.DoubleValue absolute = 1
Let delta by determined as in the comments for Direction below. If delta > absolute, fail the validation.
optional google.protobuf.DoubleValue relative = 2
Let delta by determined as in the comments for Direction below. If delta / X_old > relative, fail the validation.
MetricDirection direction = 3

Generic value threshold message. Fail the validation if the value does not lie in [lower_bound, upper_bound], both boundaries inclusive.

Used in: MetricThreshold

optional google.protobuf.DoubleValue lower_bound = 1
Lower bound. Assumed to be -Infinity if not set.
optional google.protobuf.DoubleValue upper_bound = 2
Upper bound. Assumed to be +Infinity if not set.

Metric configuration.

Used in: MetricsSpec

string class_name = 1
Name of a class derived for either tf.keras.metrics.Metric or tfma.metrics.Metric.
string module = 2
Optional name of module associated with class_name. If not set then class will be searched for under tfma.metrics followed by tf.keras.metrics.
string config = 3
Optional JSON encoded config settings associated with the class. The config settings are used to initialize the metric based on its associated from_config method. Typically the values that are used will be the same as the **kwarg values passed to the __init__ method for the class. For ease of use the leading and trailing '{' and '}' brackets may be omitted. Example: '"name": "my_metric", "thresholds": [0.5]'
optional MetricThreshold threshold = 4
Optional threshold for model validation on all slices.
repeated PerSliceMetricThreshold per_slice_thresholds = 5
Optional thresholds for model validation using specific slices.
repeated CrossSliceMetricThreshold cross_slice_thresholds = 6
Optional thresholds for model validation across slices.

Used in: GenericChangeThreshold

UNKNOWN = 0
LOWER_IS_BETTER = 1
HIGHER_IS_BETTER = 2

Sync with PerformanceStatistics because of b/110954446. LINT.IfChange A metric key uniquely identifies a metric.

Used in: MetricsForSlice.MetricKeyAndValue, ValidationFailure

string name = 1
Name of the metric ('auc', etc).
string model_name = 4
Optional model name associated with metric (if multi-model evaluation).
string output_name = 2
Optional output name associated with metric (for multi-output models).
optional SubKey sub_key = 3
Optional sub key associated with metric.
optional AggregationType aggregation_type = 6
Optional type of aggregation (if AggregationOptions used).
optional google.protobuf.BoolValue example_weighted = 7
If true, the metric is weighted by examples. If false, then the metric is not weighted by examples. If unset then it is unknown as to whether the metric was weighted by examples or not (i.e. the metrics were defined inside of a model). See MetricsSpecs.example_weighted for more information.
bool is_diff = 5
If true, this metric is a diff metric based on a comparison with the baseline.

Used in: CrossSliceMetricThreshold, MetricConfig, MetricsSpec, PerSliceMetricThreshold, ValidationFailure

oneof validate_absolute
- GenericValueThreshold value_threshold = 1
oneof validate_relative
- GenericChangeThreshold change_threshold = 2

It stores metrics values in different types, so that the frontend will know how to visualize the values based on the types.

Used in: AttributionsForSlice.AttributionsKeyAndValues, ConfidenceInterval, MetricsForSlice, MetricsForSlice.MetricKeyAndValue, ValidationFailure

oneof type
- google.protobuf.DoubleValue double_value = 1
- BoundedValue bounded_value = 2
  bounded_value is deprecated for use as a confidence interval container. Only use to encode non-CI bounds, such as approximation bounds.
- TDistributionValue t_distribution_value = 9
- ValueAtCutoffs value_at_cutoffs = 4
- ConfusionMatrixAtThresholds confusion_matrix_at_thresholds = 5
- MultiClassConfusionMatrixAtThresholds multi_class_confusion_matrix_at_thresholds = 11
- UnknownType unknown_type = 3
- bytes bytes_value = 6
- ArrayValue array_value = 7
- string debug_message = 10
  This field will contain a generic message to be used to communicate any extra information, such as in a scenario when no data is aggregated for a small data slice due to privacy concerns.

oneof slicing_spec_oneof
- SliceKey slice_key = 1
  The slice key for the metrics.
- CrossSliceKey cross_slice_key = 3
  The cross slice key for the metrics.
repeated MetricsForSlice.MetricKeyAndValue metric_keys_and_values = 51
Metric keys and values.
map<string, MetricValue> metrics = 2
A map to store metrics. Currently we convert the post_export_metric provided by TFMA to its appropriate type for better visualization, and map all other metrics to DoubleValue type.

Used in: MetricsForSlice

optional MetricKey key = 1
optional MetricValue value = 2
optional ConfidenceInterval confidence_interval = 3
When the `confidence_interval` field is populated, the `value` field will contain the point estimate.

Metrics specification.

Used in: EvalConfig

repeated MetricConfig metrics = 1
List of metric configurations.
repeated string model_names = 2
Names of models (as defined by model_specs) the metrics should be calculated for. If this list is empty then all the names defined in the model_specs will be assumed else these metrics will only be computed for the model names provided.
repeated string output_names = 3
Names of outputs the metrics should be calculated for (required for multi-output models). See comment under the ModelSpec.prediction_key on the difference between output_name and prediction_key.
map<string, float> output_weights = 10
Optional weights to use when aggregating across outputs. Output aggregation will only be performed when weights are configured and only between outputs that have a weight set. For example, assume metrics contains 'auc' and the following output information was configured: output_names = ['output_1', 'output_2', 'output_3'] output_weights = {'output_1': 1.0, 'output_2': 1.0} An 'auc' metric will be computed for each output along with an overall auc metric calculated as (1.0*(auc output_1) + 1.0*(auc output_2)) / (1.0+1.0).
optional BinarizationOptions binarize = 4
Optional binarization options for converting multi-class / multi-label model outputs into outputs suitable for binary classification metrics.
optional AggregationOptions aggregate = 6
Optional aggregation options for computing overall aggregate metrics for multi-class / multi-label model outputs. Aggregation options are computed separately from binarization options so both can be set safely at the same time.
optional ExampleWeightOptions example_weights = 11
Optional example weight options. If no options are provided then the metrics will be weighted by default provided at least one of the models and outputs associated with this spec has an example_weight_key configured, otherwise the metrics will be unweighted by default. If weighted is enabled for the metrics, but an example_weight_key is not associated with a given model or output, then those metrics will still be considered weighted just using a weight value of 1.0.
string query_key = 5
Optional query key for query/ranking based metrics.
map<string, MetricThreshold> thresholds = 7
Thresholds defined here are intended to be used for metrics that were saved with the model and computed by default without requiring a metric config. All other thresholds should be defined in the MetricConfig associated with the metric. Optional thresholds for model validation on all slices (keyed by the associated metric name - e.g. 'auc', etc).
map<string, PerSliceMetricThresholds> per_slice_thresholds = 8
Optional thresholds for model validation using specific slices (keyed by the associated metric name - e.g. 'auc', etc).
map<string, CrossSliceMetricThresholds> cross_slice_thresholds = 9
Optional thresholds for model validation across slices (keyed by the associated metric name - e.g. 'auc', etc).

Used in: ValidationResult

oneof slice_key_oneof
- SliceKey slice_key = 2
  SliceKey of a given slice.
- CrossSliceKey cross_slice_key = 4
  CrossSliceKey for cross slice validations.
repeated ValidationFailure failures = 3
All failures under a slice.

Model specification.

Used in: EvalConfig

string name = 2
Name used to distinguish different models when multiple instances are being evaluated. Note that this name is not necessarily the name of the model as seen by a trainer, etc. This name is more of an alias for both a model name and a particular version and/or format. For example, common names to use here might be "candidate" or "baseline" when referring to different versions of the same model that are being evaluated for the purpose of model validation. Note also that if only a single ModelSpec is used in the config, then no model_name will be set in any metrics keys that are output regardless of whether a name was provided here or not.
string model_type = 12
The type of the model that is being evaluated. Supported types include "tf_keras", "tf_estimator", "tf_lite", "tf_js", and "tf_generic". If unset, automatically detects whether the model_type is "tf_keras", "tf_estimator", or "tf_generic" based on whether the model loads as a keras model followed by whether or not the signature_name is set to "eval".
string signature_name = 3
Optional name of signature to use for inference (e.g. "serving_default"). For estimator based EvalSavedModels, this must be set to "eval". If not set, then the default depends on the model_type. For "tf_keras" models the model itself will be used for inference. For models that support signatures ("tf_generic", etc) "predict" (if it exists) or "serving_default" will be assumed. For models that don't use signatures ("tf_lite", etc) this setting will be ignored.
repeated string preprocessing_function_names = 13
Optional names of preprocessing functions to run in the order that they should be invoked. Preprocessing functions are used to transform the features into the form required for inference and metrics evaluation. The output from preprocessing can also be used for slicing. Preprocessing functions can be saved as signatures or as attributes on the saved model. If no names are provided, the names "transformed_features" and "transformed_labels" will be searched for. The output of a preprocessing function will override the feature with the same name for label or example weight extraction purposes. If a preprocessing function outputs a non-dict value, then it will be stored as a feature under the preprocessing function name itself. For example, if a function called "transformed_labels" outputs a single array value then it will associated with the feature name "transformed_labels". This name can be used when setting the "label_key" or in slicing configs.
string label_key = 5
Label key (single-output model). The key can identify either a transformed feature (see preprocessing_function_names) or a raw input feature. Use one of label_key or label_keys.
map<string, string> label_keys = 6
Label keys (multi-output model) keyed by output_name. If all the outputs for a multi-output model use the same key, then a single key may also be used. Use one of label_key or label_keys.
oneof not allowed with maps
string prediction_key = 7
Optional prediction key (single_output model). The prediction key is used to distinguish between different values when the output from the predict call is a dict instead of a single tensor. For estimator models, this is always the case and the prediction key is automatically inferred -- the keys 'scores', 'logistic', 'predictions', and 'probabilities' are tried (in that order). For Keras models, outputs are typically not dicts but if they are, then prediction key is not inferred and so MUST be specified. Note: For multi-class predictions, a prediction key needs to be specified so that the metrics can be computed correctly per class. TODO(b/399156775): Remove once the bug is fixed. The prediction key is also used in cases where the predictions are pre-calculated and stored along side the features (a model is not used). In this case the prediction key refering to a key in the features dictionary must be provided. Use one of prediction_key or prediction_keys. Note that prediction_key is NOT the same as the output_name used in the MetricsSpec. The output_name refers to the name of an output for a multi-output model (for tf.Estimator's this is called the "head" whereas for keras the term output is used). Some outputs (typically tf.Estimator) are themselves made up of a dict of multiple tensors (e.g. 'classes', 'probabilities', etc). The predition_key specifies which key in the output contains the prediction values (i.e. 'probabilities', etc). For example, a tf.Estimator model might output the following: { 'head1': { 'classes': classes_tensor, 'class_ids': class_ids_tensor, 'logits': logits_tensor', 'probabilities': probabilities_tensor } 'head2': { 'classes': classes_tensor, 'class_ids': class_ids_tensor, 'logits': logits_tensor', 'probabilities': probabilities_tensor } } Here 'head1' or 'head2' would be the output_name, whereas 'probabilities' would be the prediction_key.
map<string, string> prediction_keys = 8
Optional prediction keys (multi-output model) keyed by output_name. Use one of prediction_key or prediction_keys. See comment under prediction_key on the difference between output_name and prediction_key.
oneof not allowed with maps
string example_weight_key = 9
Optional example weight key (single-output model). The example_weight_key can identify either a transformed feature (see preprocessing_function_names) or raw input feature. Use one of example_weight_key or example_weight_keys.
map<string, string> example_weight_keys = 10
Optional example weight keys (multi-output model) keyed by output_name. If all the outputs for a multi-output model use the same key, then a single key may also be used. Use one of example_weight_key or example_weight_keys.
oneof not allowed with maps
bool is_baseline = 11
True if baseline model (otherwise candidate). Only one baseline is allowed per evaluation run.
optional PaddingOptions padding_options = 14
Options for padding prediction and label arrays before feeding to metrics. Predictions and labels may not have the same length (for example, the model may pad the predictions so that a batch of predictions is aligned while labels are extracted from the input and is not padded) while metrics may require they be of the same length. TFMA can pad the shorter one with the configured values.
int32 inference_batch_size = 15
Batch size used by the inference implementation. This batch size is only used for inference with this model. It does not affect the batch size of other models, and it does not affect the batch size used in the rest of the pipeline. This is implemented for the ServoBeamPredictionsExtractor and TfxBslPredictionsExtractor.

Used in: MetricValue, PlotData

repeated MultiClassConfusionMatrixAtThresholds.MultiClassConfusionMatrix matrices = 1
Entries are sorted in order of threshold.

Used in: MultiClassConfusionMatrixAtThresholds

double threshold = 1
repeated MultiClassConfusionMatrixEntry entries = 2
Only entries with non-zero num_weighted_examples are included. If the top prediction was less than the threshold, then the predict_class_id will be set to -1. Entries are sorted in order of actual_class_id followed by predicted_class_id.

Used in: MultiClassConfusionMatrix

int32 actual_class_id = 1
int32 predicted_class_id = 2
double num_weighted_examples = 3

Used in: PlotData

repeated MultiLabelConfusionMatrixAtThresholds.MultiLabelConfusionMatrix matrices = 1
Entries are sorted in order of threshold.

Used in: MultiLabelConfusionMatrixAtThresholds

double threshold = 1
repeated MultiLabelConfusionMatrixEntry entries = 2
Only entries with no non-zero values are included. Entries are sorted in order of actual_class_id followed by predicted_class_id.

Used in: MultiLabelConfusionMatrix

int32 actual_class_id = 1
int32 predicted_class_id = 2
double false_negatives = 3
double true_negatives = 4
double false_positives = 5
double true_positives = 6

Additional configuration options.

Used in: EvalConfig

optional google.protobuf.BoolValue include_default_metrics = 1
True to include metrics saved with the model(s) (where possible) when calculating metrics. Any metrics defined in metrics_specs will override the metrics defined in the model if there are overlapping names.
optional google.protobuf.BoolValue compute_confidence_intervals = 2
True to calculate confidence intervals.
optional ConfidenceIntervalOptions confidence_intervals = 9
optional google.protobuf.Int32Value min_slice_size = 3
Int value to omit slices with example count < min_slice_size.
optional RepeatedStringValue disabled_outputs = 7
List of outputs that should not be written (e.g. 'metrics', 'plots', 'analysis', 'eval_config.json').

Options for padding prediction and label arrays before feeding to metrics. Predictions and labels may not have the same length (for example, the model may pad the predictions so that a batch of predictions is aligned, while labels are extracted from the input and is not padded) while metrics may require they be of the same length. TFMA can pad the shorter one with the configured values.

Used in: ModelSpec

oneof label_padding
If neither of the oneof is set, 0 will be used.
- int64 label_int_padding = 1
- float label_float_padding = 2
oneof prediction_padding
If neither of the oneof is set, 0 will be used.
- int64 prediction_int_padding = 3
- float prediction_float_padding = 4

Used in: MetricConfig, PerSliceMetricThresholds

repeated SlicingSpec slicing_specs = 1
A list of slicing specs to apply threshold to. An empty SlicingSpec represents the overall slice. NOTE: These are only references to slice definitions not new definitions. Slices must have been defined using EvalConfig.slicing_specs. See EvalConfig.slicing_specs for examples.
optional MetricThreshold threshold = 2

Used in: MetricsSpec

repeated PerSliceMetricThreshold thresholds = 1

Used in: PlotsForSlice, PlotsForSlice.PlotKeyAndValue

optional CalibrationHistogramBuckets calibration_histogram_buckets = 1
For calibration plot and prediction distribution.
optional ConfusionMatrixAtThresholds confusion_matrix_at_thresholds = 2
For auc curve and auprc curve.
optional MultiClassConfusionMatrixAtThresholds multi_class_confusion_matrix_at_thresholds = 4
For multi-class confusion matrix.
optional MultiLabelConfusionMatrixAtThresholds multi_label_confusion_matrix_at_thresholds = 5
For multi-label confusion matrix.
string debug_message = 3
This field will contain a generic message to be used to communicate any extra information, such as in a scenario when no data is aggregated for a small data slice due to privacy concerns.

Sync with PerformanceStatistics because of b/110954446. LINT.IfChange A plot key uniquely identifies a set of PlotData.

Used in: PlotsForSlice.PlotKeyAndValue

string name = 6
Optional plot name associated with plot.
string model_name = 4
Optional model name associated with plot (if multi-model evaluation).
string output_name = 2
Optional output name associated with plot (for multi-output models).
optional SubKey sub_key = 3
Optional sub key associated with plot.
optional google.protobuf.BoolValue example_weighted = 5
If true, the plot is weighted by examples. If false, then the plot is not weighted by examples. If unset then it is unknown as to whether the plot was weighted by examples or not. See MetricsSpecs.example_weighted for more information.

oneof slicing_spec_oneof
- SliceKey slice_key = 1
  The slice key for the metrics.
- CrossSliceKey cross_slice_key = 4
  The cross slice key for the metrics.
repeated PlotsForSlice.PlotKeyAndValue plot_keys_and_values = 8
Plot keys and values.
optional PlotData plot_data = 2
The plot data--deprecated please use 'plots' instead.
map<string, PlotData> plots = 3
Use this field instead of tfma_plots to support multiple plot evaluations in a single evaluator run. Note that each entry of TFMAPlotData should contain all plots for the same grouping. eg: for the same head of a multihead model or for the same class in the case of multiclass. For example, the key can be of the form 'post_export_metrics/head_name' for a multihead model.

Used in: PlotsForSlice

optional PlotKey key = 1
optional PlotData value = 2

Repeated int32 value. Used to allow a default if no values are given.

Used in: AggregationOptions, BinarizationOptions

repeated int32 values = 1

Repeated string value. Used to allow a default if no values are given.

Used in: Options

repeated string values = 1

A single slice key.

Used in: SliceKey

string column = 1
oneof kind
- bytes bytes_value = 2
- float float_value = 3
- int64 int64_value = 4

A slice key, which may consist of multiple single slice keys.

Used in: AttributionsForSlice, CrossSliceKey, MetricsForSlice, MetricsValidationForSlice, PlotsForSlice

repeated SingleSliceKey single_slice_keys = 1

Information about slices matched.

Used in: ValidationDetails

oneof slicing_spec_oneof
- SlicingSpec slicing_spec = 1
- CrossSlicingSpec cross_slicing_spec = 3
int32 num_matching_slices = 2

LINT.IfChange Slicing specification.

Used in: CrossSlicingSpec, EvalConfig, PerSliceMetricThreshold, SlicingDetails, ValidationResult

repeated string feature_keys = 1
Feature keys to slice on. Note that the feature key can be either a transformed feature key (see ModelSpec.preprocessing_function_names) or a raw feature key parsed directly from the inputs. If a transformed feature key and raw feature key use the same name, the transformed feature will take precedence. Note also that while transformed features are associated with the models that processed them, when it comes to slicing all the unique values across all models will be used.
map<string, string> feature_values = 2
Feature values to slice on keyed by associated feature keys. The same caveats that apply to feature_keys with respect to feature transformations and raw features apply to feature_values as well (see feature_keys for more information). Note that strings representing ints and floats will be automatically converted to ints and floats respectively and will be compared against both the string versions and int or float versions of the associated features.
string slice_keys_sql = 3
This config is an alternative to the config above. It must have the pattern: "SELECT STRUCT({feature_name} [AS {slice_key}]) [FROM example.feature_name [, example.feature_name, ... ] [WHERE ... ]]" The “example.feature_name” inside the FROM statement is used to flatten the repeated fields. For non-repeated fields, you can directly write the config as follows: “SELECT STRUCT(non_repeated_feature_a, non_repeated_feature_b)”. When executing, this SQL expression will be further wrapped as: “SELECT ARRAY({slice_keys_sql}) as slices FROM Examples as example”. The resulting output of the query will have the same number of rows as the input dataset. Each row will only have one column named "slices". Each row is a list. Each element in the list will be a list of tuple with ('key', 'value') pairs representing a slice. For example, a single row could be: [[(‘gender’, ‘male’), (‘country’: ‘USA’)], [(‘zip_code’, ‘123456’)]] In the user’s SQL statement, the “example” is a key word that binds to each input "row". The semantics of this variable will depend on the decoding of the input data to the Arrow representation (e.g., for tf.Example, each key is decoded to a separate column). Thus, structured data can be readily accessed by iterating/unnesting the fields of the "example" variable. Example 1: slice_keys_sql="SELECT STRUCT(gender) FROM example.gender" - This equals to config: feature_keys=[gender] - the slice key and value will be: (gender, {gender_value}) Example 2: slice_keys_sql = "SELECT STRUCT(gender, country) FROM example.gender, example.country WHERE country = 'USA'" - This equals to config: feature_keys=[gender], feature_values={country:'USA'} - the slice key and value will be: (gender_x_country, {gender_value}_x_USA) Example 3 (background positive subgroup negative): slice_keys_sql= "SELECT STRUCT('male' as bpsn) FROM example WHERE ('male' not in UNNEST(example.gender) and 1 in UNNEST(example.label)) or ('male' in UNNEST(example.gender) and 0 in UNNEST(example.label))" - the slice key and value will be: (bpsn, male)

Sync with PerformanceStatistics because of b/110954446. LINT.IfChange A sub key identifies specialized sub-types of metrics and plots.

Used in: AttributionsKey, MetricKey, PlotKey

optional google.protobuf.Int32Value class_id = 1
Used with multi-class metrics to identify a specific class ID.
optional google.protobuf.Int32Value k = 2
Used with multi-class metrics to identify the kth predicted value.
optional google.protobuf.Int32Value top_k = 3
Used with multi-class and ranking metrics to identify top-k predicted values.

Represents a t-distribution, which includes sample mean, sample standard deviation and degrees of freedom of samples. It's calculated when evaluation runs on multiple samples, which by default generated by the Poisson bootstrapping method: http://www.unofficialgoogledatascience.com/2015/08/an-introduction-to-poisson-bootstrap26.html

Used in: ConfusionMatrixAtThresholds.ConfusionMatrixAtThreshold, MetricValue, ValueAtCutoffs.ValueCutoffPair

optional google.protobuf.DoubleValue sample_mean = 1
Sample Mean.
optional google.protobuf.DoubleValue sample_standard_deviation = 2
Sample Standard Deviation.
optional google.protobuf.Int64Value sample_degrees_of_freedom = 3
Number of degrees of freedom.
optional google.protobuf.DoubleValue unsampled_value = 4
Represents the value of data if calculated without bootstrapping. This field is deprecated as going forward we will remove the TDistributionValue from the oneof in the MetricValue and the unsampled value will be populated in MetricValue.double_value

The value will be converted into an error message if we do not know its type.

Used in: MetricValue

string error = 1
bytes value = 2

Extra details about validation.

Used in: ValidationResult

repeated SlicingDetails slicing_details = 1

Information about failure per metric.

Used in: MetricsValidationForSlice

optional MetricKey metric_key = 1
optional MetricThreshold metric_threshold = 2
optional MetricValue metric_value = 3
string message = 4

bool validation_ok = 1
True if there are no metric validation failures or missing slices, else false.
bool missing_thresholds = 6
True if failure due to missing thresholds.
repeated MetricsValidationForSlice metric_validations_per_slice = 2
Information about which threshold is blocking which metric.
repeated SlicingSpec missing_slices = 3
Information about missing slices.
repeated CrossSlicingSpec missing_cross_slices = 5
Information about missing cross slices.
optional ValidationDetails validation_details = 4
Extra details about validation performed.
bool rubber_stamp = 7
True if this run is rubbertamped. A rubberstamped validation is one in which there was no baseline model, so diff thresholds were ignored, but in which the non-diff thresholds are still checked.

Value at cutoffs, e.g. for precision@K, recall@K

Used in: MetricValue

repeated ValueAtCutoffs.ValueCutoffPair values = 1

Used in: ValueAtCutoffs

int32 cutoff = 1
double value = 2
optional BoundedValue bounded_value = 3
optional TDistributionValue t_distribution_value = 4

package tensorflow_model_analysis

message AggregationOptions

oneof type

bool micro_average = 1

bool macro_average = 2

bool weighted_macro_average = 3

map<int32, float> class_weights = 4

optional RepeatedInt32Value top_k_list = 5

message AggregationType

oneof type

bool micro_average = 1

bool macro_average = 2

bool weighted_macro_average = 3

message ArrayValue

ArrayValue.DataType data_type = 1

repeated int32 shape = 2

repeated bytes bytes_values = 3

repeated int32 int32_values = 4

repeated int64 int64_values = 5

repeated float float32_values = 6

repeated double float64_values = 7

enum ArrayValue.DataType

UNKNOWN = 0

BYTES = 1

INT32 = 2

INT64 = 3

FLOAT32 = 4

FLOAT64 = 5

message AttributionsForSlice

oneof slicing_spec_oneof

SliceKey slice_key = 1

CrossSliceKey cross_slice_key = 3

repeated AttributionsForSlice.AttributionsKeyAndValues attributions_keys_and_values = 2

message AttributionsForSlice.AttributionsKeyAndValues

optional AttributionsKey key = 1

map<string, MetricValue> values = 2

message AttributionsKey

string name = 1

string model_name = 2

string output_name = 3

optional SubKey sub_key = 4

optional google.protobuf.BoolValue example_weighted = 6

bool is_diff = 5

message BinarizationOptions

optional RepeatedInt32Value class_ids = 4

optional RepeatedInt32Value k_list = 5

optional RepeatedInt32Value top_k_list = 6

message BoundedValue

optional google.protobuf.DoubleValue lower_bound = 1

optional google.protobuf.DoubleValue upper_bound = 2

optional google.protobuf.DoubleValue value = 3

BoundedValue.Methodology methodology = 4

enum BoundedValue.Methodology

UNKNOWN = 0

RIEMANN_SUM = 1

POISSON_BOOTSTRAP = 2

message CalibrationHistogramBuckets

repeated CalibrationHistogramBuckets.Bucket buckets = 1

message CalibrationHistogramBuckets.Bucket

double lower_threshold_inclusive = 1

double upper_threshold_exclusive = 2

optional google.protobuf.DoubleValue num_weighted_examples = 3

optional google.protobuf.DoubleValue total_weighted_label = 4

optional google.protobuf.DoubleValue total_weighted_refined_prediction = 5

message ConfidenceInterval

optional MetricValue upper_bound = 1

optional MetricValue lower_bound = 2

optional MetricValue standard_error = 3

optional google.protobuf.Int64Value degrees_of_freedom = 4

message ConfidenceIntervalOptions

ConfidenceIntervalOptions.ConfidenceIntervalMethod method = 1

enum ConfidenceIntervalOptions.ConfidenceIntervalMethod

UNKNOWN_CONFIDENCE_INTERVAL_METHOD = 0

POISSON_BOOTSTRAP = 1

JACKKNIFE = 2

message ConfusionMatrixAtThresholds

repeated ConfusionMatrixAtThresholds.ConfusionMatrixAtThreshold matrices = 1

message ConfusionMatrixAtThresholds.ConfusionMatrixAtThreshold

double threshold = 1

double false_negatives = 2