Get desktop application:
View/edit binary Protocol Buffers messages
A 'device' is a physical entity in the system and is comprised of several resources.
Used in:
The name of the device.
The id of this device, unique in a single trace.
The resources on this device, keyed by resource_id;
Result proto for HloExtraInfoMap.
Used in:
A map from HLO name to HloExtraInfo.
Result proto for HloExtraInfo.
Used in:
Category of the HLO op given by the compiler.
The long name of the HLO that includes the dimensions.
The per-TPU-core batch size inferred from this HLO.
Result proto for host-dependent job information.
Used in:
This ID of the host where the job was run on.
The command line used to run the job.
The start time of the job on this host.
Result proto for host-independent job information.
Used in:
The change-list number of this build.
The time of this build.
The target of this build.
The types of host operations that are tracked.
Invalid host op.
Each of host op type has two parts: (1) the stage where the op happens and (2) the op name. stage = Input Data Producer, op = Get Next Batch.
stage = Input Data Producer, op = Session Run.
stage = Input Data Producer, op = Forward Batch.
stage = Infeed Thread, op = Get Next Batch.
stage = Infeed Thread, op = Session Run.
stage = Infeed Thread, op = Forward Batch.
stage = Outfeed Thread, op = Get Next Batch.
stage = Outfeed Thread, op = Session Run.
stage = Outfeed Thread, op = Forward Batch.
Used in:
Map from core id to HostOpsPerTpuStep.
Used in:
Map from hostname to a map from core id to HostOpsPerTpuStep.
Result proto for the host ops per TPU step.
Used in:
Whether the data in this message is valid.
The current TPU step number.
The beginning time of the current TPU step on the device in picoseconds.
The ending time of the current TPU step on the device in picoseconds.
For each possible host operation, maps to the difference between the TPU step number that the host op targets and the current TPU step number. The key is HostOp, value is the step difference.
Result proto for the host ops for all TPU steps.
Used in:
A sequence of records with one for each TPU step. Each record is a map from hostname to a map from core id to HostOpsPerTpuStep.
Result proto for looping-related metrics.
Used in:
The total iteration time in nanoseconds.
The total number of iterations.
The total computation time in nanoseconds.
The total number of computations.
Result proto for OpMetricsDb.
Used in:
A bunch of OpMetricsResults.
The total host infeed-enqueue duration in picoseconds.
The total of the difference between the start times of two consecutive infeed-enqueues (per host) in picoseconds.
Result proto for OpMetrics.
Used in:
True if this OP is executed on the device; False if it is executed on the host.
Name of this OP.
Rank of this OP.
The starting time in cycles of the last instance of this OP executed.
The ending time in cycles of the last instance of this OP executed.
If this OP (say A), is an immediate child of another OP (say B), this field stores the sum of duration in microseconds of A inside B. If A appears more than once in B, the duration of all A's appearances will be added together. This sum will be reset after the self-time of B is calculated so that it can be reused for a new parent OP.
Number of instances that this OP occurred.
Total time in microseconds spent in this OP (accumulated over all of its occurrences).
Total self time in microseconds spent in this OP (accumulated over all of its occurrences).
The total self time as a fraction of sum of all OP's total self time on the host.
Cumulative total self time in fraction on the host.
The total self time as a fraction of sum of all OP's total self time on the device.
Cumulative total self time in fraction on the device.
Total number of FLOPs incurred by this OP.
Total number of bytes accessed by this OP.
Total time in microseconds that special hw unit 1 is occupied by this OP.
Total time in microseconds that special hw unit 2 is occupied by this OP.
Total memory stall time in microseconds.
A 'resource' generally is a specific computation component on a device. These can range from threads on CPUs to specific arithmetic units on hardware devices.
Used in:
The name of the resource.
The id of the resource. Unique within a device.
Result proto for RunEnvironment (the run environment of a profiling session).
Used in:
Number of hosts used.
The type of TPU used.
The number of TPU cores used.
The per-TPU-core batch size.
Host-independent job information.
Host-dependent job information.
Result proto for a StepDatabase.
Used in:
A map from core_id to StepSequenceResult.
Result proto for StepInfo.
Used in:
The (micro) step number.
The step duration in picoseconds.
The infeed duration in picoseconds. Can turn into a map if we want a variable number of ops.
The start time of this step in picoseconds.
The waiting time within this step in picoseconds.
The time spent on cross-replica-sum in picoseconds.
Result proto for a sequence of steps.
Used in:
A sequence of StepInfoResults.
The TPUEmbeddingConfiguration contains specification of TPU Embedding lookups and gradient updates separate from the TF Graph.
num_hosts is the number of host CPU systems in the training/inference job. Each embedding table must be sharded into num_hosts separate Variables, placed separately on the num_hosts CPU devices in the cluster. Sharding will be performed equivalently to the 'div' sharding_strategy option of embedding_lookup() and embedding_lookup_sparse().
The total number of TensorNodes. This is equal to num_hosts times the number of TensorNodes attached to each host.
The number of training examples per TensorNode.
Used in:
Used in:
model_mode specifies whether the model is to be run in training or inference. In inference mode, gradient updates to embedding tables are not performed.
Used in:
Each Embedding
Used in:
Name of the embedding table. This will be used to name Variables in the Tensorflow Graph.
Number of rows of the embedding table. The Variable created to hold the learned embedding table values will have shape (num_rows, width).
Width of the embedding table. The Variable created to hold the learned embedding table values will have shape (num_rows, width).
Number of distinct embedding activation vectors per training example produced by lookups into this table during model evaluation. For each table, the Graph will receive an activations Tensor of shape (batch_size * table.num_features, table.width). For example, num_features = 1 produces equivalent behavior to a single tf.nn.embedding_lookup() call. In the case of 'multivalent' embeddings, (i.e. tf.nn.embedding_lookup_sparse()) which compute weighted averages of embedding table rows, num_features is the number of vectors produced after averaging. In sequence models num_features is typically equal to the sequence length, since each sequence element must be represented separately to the convolutional or recurrent network.
Result proto for TfStatsHelper.
The result for the TF-metric database.
The result for the HLO-metric database.
The result for the step database.
The result for the looping-related metrics.
The result for the HloExtraInfoMap.
Overall matrix unit utilization in percentage.
The run environment of this profiling session.
The result for the host operations.
A map from core ID to name.
Describes the geometry of a TPU mesh.
The dimensions of the TPU topology, in cores. Typically, this is a 3D topology [x, y, core], where the major dimensions correspond to TPU chips, and the minor dimension describes the number of cores on a multicore chip.
Number of TensorFlow tasks in the cluster.
Number of TPU devices per task.
A flattened rank 3 int32 array with shape [num_tasks, num_tpu_devices_per_task, len(mesh_shape)]. `tasks` is the number of tasks in the TPU cluster, `devices` is the number of TPU devices per task, and the minor dimension corresponds to a position in the TPU mesh topology. Each entry [task, device, axis] gives the `axis`-th coordinate in the topology of a task/device pair.
A 'Trace' contains metadata for the individual traces of a system.
The devices that this trace has information about. Maps from device_id to more data about the specific device.
All trace events capturing in the profiling period.
Used in:
The id of the device that this event occurred on. The full dataset should have this device present in the Trace object.
The id of the resource that this event occurred on. The full dataset should have this resource present in the Device object of the Trace object. A resource_id is unique on a specific device, but not necessarily within the trace.
The name of this trace event.
The timestamp that this event occurred at (in picos since tracing started).
The duration of the event in picoseconds if applicable. Events without duration are called instant events.