Get desktop application:
View/edit binary Protocol Buffers messages
CheckpointWorkload is an artifact created by a trial during training.
Used in:
, ,UUID of the checkpoint.
The time the workload finished or was stopped.
The state of the checkpoint.
Dictionary of file paths to file sizes in bytes of all files in the checkpoint.
Total number of batches as of this workload's completion.
User defined metadata associated with the checkpoint.
Metrics report.
Used in:
, , , ,ID of the trial.
End time of when metric was reported.
Struct of the reported metrics.
batches completed in the report.
If metric is archived.
ID of metric in table.
Run ID of trial when metric was reported.
Name of the Metric Group ("training", "validation", anything else)
MetricsWorkload is a workload generating metrics.
Used in:
, ,The time the workload finished or was stopped.
Metrics.
Number of inputs processed.
Total number of batches as of this workload's completion.
The rendezvous info for the trial to rendezvous with sibling containers.
Used in:
The rendezvous addresses of the other containers.
The container rank.
The slots for each address, respectively.
The current state of the trial. see \dT+ trial_state in db
Used in:
, , ,The trial is in an unspecified state.
The trial is in an active state.
The trial is in a paused state
The trial is canceled and is shutting down.
The trial is killed and is shutting down.
The trial is completed and is shutting down.
The trial is errored and is shutting down.
The trial is canceled and is shut down.
The trial is completed and is shut down.
The trial is errored and is shut down.
The trial is queued (waiting to be run, or job state is still queued). Queued is a substate of the Active state.
The trial is pulling the image. Pulling is a substate of the Active state.
The trial is preparing the environment after finishing pulling the image. Starting is a substate of the Active state.
The trial's allocation is actively running. Running is a substate of the Active state.
Trial is a set of workloads and are exploring a determined set of hyperparameters.
Used in:
, , , , , , ,The id of the trial.
The id of the parent experiment.
The time the trial was started.
The time the trial ended if the trial is stopped.
The current state of the trial.
Number times the trial restarted.
Trial hyperparameters.
The current processed batches.
Best validation.
Latest validation.
Best checkpoint.
The last reported state of the trial runner (harness code).
The wall clock time is all active time of the cluster for the trial, inclusive of everything (restarts, initiailization, etc), in seconds.
UUID of checkpoint that this trial started from.
Id of the first task associated with this trial. This field is deprecated since trials can have multiple tasks.
The sum of sizes of all resources in all checkpoints for the trial.
The count of checkpoints.
summary metrics
Task IDs of tasks associated with this trial. Length of task_ids will always be greater or equal to one when TaskID is sent. For example CompareTrial we will send a reduced Trial object, without TaskID or TaskIDs fileld in. The first element of task_ids will be the same as task_id. task_ids is sorted ascending by task_run_id.
Signed searcher metrics value.
Number of days to retain logs for.
metadata associated with the trial (based off the metadata stored in the run).
Log Policy Matched.
Signals to the experiment the trial early exited.
Used in:
The reason for the exit.
The reason for an early exit.
Used in:
Zero-value (not allowed).
Indicates the trial exited due to an invalid hyperparameter.
Indicates the trial exited due to an invalid hyperparameter in the trial init.
Metrics from the trial some duration of training.
Used in:
, ,The trial associated with these metrics.
The trial run associated with these metrics.
The number of batches trained on when these metrics were reported.
The client-reported time associated with these metrics.
The metrics for this bit of training, including: - avg_metrics: metrics reduced over the reporting period). - batch_metrics: (optional) per-batch metrics.
TrialProfilerMetricLabels are the labels for a single series, where a series is a defined as all metrics sharing a distinct set of labels
Used in:
, ,The ID of the trial.
The name of the metric.
The agent ID associated with the metric.
The GPU UUID associated with the metric.
The type of the metric.
To distinguish the 2 different categories of metrics.
Used in:
Zero-value (not allowed).
For systems metrics, like GPU utilization or memory.
For timing metrics, like how long a backwards pass or getting a batch from the dataloader took.
For other miscellaneous metrics.
TrialProfilerMetricsBatch is a batch of trial profiler metrics. A batch will contain metrics pertaining to a single series. The fields values, batches and timestamps will be equal length arrays with each index corresponding to a reading.
Used in:
,The measurement for a reading, repeated for the batch of metrics.
The batch at which a reading occurred, repeated for the batch of metrics.
The timestamp at which a reading occurred, repeated for the batch of metrics.
The labels for this series.
The metadata pertaining to the current running task for a trial.
Used in:
The state of the trial runner.
Denotes a connection between a given trial and a checkpoint or model_version
Used in:
ID of the trial.
UUID of the checkpoint.
Source `id`` for the model which generated the checkpoint (if applicable)
Source `version` in the model_version version field which generated the checkpoint (if applicable)
Type for this trial_source_info
TrialSourceInfoType is the type of the TrialSourceInfo, which serves as a link between a trial and a checkpoint or model version
Used in:
, ,The type is unspecified
"Inference" Trial Source Info Type, used for batch inference
"Fine Tuning" Trial Source Info Type, used in model hub