Get desktop application:
View/edit binary Protocol Buffers messages
Measurements of an operation (or aggregated set of operations). Metrics are always "total" rather than "self".
Used in:
Floating point computations performed by this operation, as a fraction of peak core FLOPS * program time. This representation has useful properties: - it is proportional to the number of floating point operations performed - utilization is flops/time - wasted potential flops is proportional to time - flops - it does not reveal the peak core FLOPS of the hardware
The memory bandwidth used to load operands, as a fraction of thereotical memory bandwidth on the specific hardware. Index into array using MemBwType enum.
The raw stats below are aggregated across all occurrences.
Elapsed core-time in picoseconds.
Total floating-point operations performed.
Total FLOPs Normalized to the bf16 (default) devices peak bandwidth.
Total bytes accessed for each memory type. Index into array using MemBwType enum.
Number of executions.
Average "accumlated" time in picoseconds that the operation took.
An entry in the profile tree. (An instruction, or set of instructions).
Used in:
Semantics depend on contents.
May be omitted e.g. for fused instructions.
Subjected to pruning.
Details about what this node represents.
Total number of children before pruning.
A category of XLA instructions. name is a descriptive string, like "data formatting".
Used in:
(message has no fields)
A single XLA instruction. name is the unique instruction id, like "%multiply.5".
Used in:
Opcode like %multiply
%multiply = [shape]multiply(operand1, operand2)
Provenance op name, eg. TF Op name, JAX Op name
Describes the physical memory layout of the instruction's primary input. e.g. for a convolution, this analyzes the image and ignores the kernel.
Used in:
The physical data layout, from most-minor to most-major dimensions.
Used in:
Size of the data in this dimension.
Data must be padded to a multiple of alignment.
What the dimension represents, e.g. "spatial".
Profile is the top-level data that summarizes a program.
Root of a profile broken down by instruction category.
Root of a profile broken down by program.
Device type.
Exclude idle ops.