Get desktop application:
View/edit binary Protocol Buffers messages
A database of HloStats records.
All HloStats records, one for each HLO operation.
There is one HloStatsRecord for each HLO operation profiled. Next ID: 40
Used in:
The rank by self time
program_id for this op
The HLO category name.
The HLO expression.
The framework op name (TF Op, JAX Op)
Number of occurrences of the operation.
Total "accumulated" time in micro-seconds that the operation took. If this operation has any children operations, the "accumulated" time includes the time spent inside children.
Average "accumulated" time in micro-seconds that each occurrence of the operation took.
Total "self" time in micro-seconds that the operation took. If this operation has any children operations, the "self" time doesn't include the time spent inside children.
Average "self" time in micro-seconds that the operation took.
Percentage of the total "accumulated" time that was caused by DMA stall.
Total floating-point operations (FLOPs) performed per second normalized to the bf16 peak capacity.
Total Floating-point operations for the op per second.
Number of total bytes (including both read and write) accessed per second.
Number of bytes accessed from HBM (including both read and write) per second.
Number of bytes read from CMEM per second.
Number of bytes written to CMEM per second.
Number of bytes read from VMEM per second.
Number of bytes written to VMEM per second.
Overall operational intensity in FLOP/Byte.
Operational intensity based on HBM in FLOP/Byte.
Operational intensity based on CMEM read in FLOP/Byte.
Operational intensity based on CMEM write in FLOP/Byte.
Operational intensity based on VMEM read in FLOP/Byte.
Operational intensity based on VMEM write in FLOP/Byte.
Operational intensity based on the bottleneck resource in FLOP/Byte.
Whether this operation is "Compute", "HBM", "CMEM Read", "CMEM Write" bound, according to the Roofline Model.
Whether this operation is for HLO or Framework rematerialization.
Whether this op is for outside compilation.
Whether this op is autotuned.
Flops for the record
Bytes accessed for the record
Infrmation about the corresponding source code.