Get desktop application:
View/edit binary Protocol Buffers messages
/////////////////////// Global data requests
Unregisters a global allocation. If the handle given is not currently allocated, a NOT_FOUND status is returned.
(message has no fields)
Deconstructs a tuple. Returns a newly created GlobalDataHandle for each element in the tuple.
Unpack requests that a global data handle, with a tuple shape, has global data handles created for each of its constituent members. This is the equivalent of the "destructuring assignment" present in various programming languages.
Requests the shape of the referenced global data.
Requests the statistics of the given computation.
Loads a variable number of values with a given element type from ColumnIO.
Describes the path of the ColumnIO tablet to load.
Describes the field to load within the ColumnIO tablet.
Individual element shape, excluding rows.
Warning: ColumnIO does not support random-access, so use offset with caution in performance-critical scenarios.
Maximum number of elements (with shape element_shape) to load.
If more than one item is requested (via limit > 1), then this request attribute zips together the produced vectors.
Transfers the given global data to the client in the form of a Literal.
This optional field directs the service to return the literal in this layout. A shape is used to hold the layout to accommodate tuples.
Transfers the given literal to the server to be stored in a global allocation, which is returned.
Transfers the given literal to the Infeed buffer of the device.
(message has no fields)
Transferred literal from the Outfeed buffer of the device.
This optional field directs the service to return the literal in this layout. A shape is used to hold the layout to accommodate tuples.
Resets the device, clearing all existing state on the device.
(message has no fields)
Computes the value of a constant expression. The request contains the computation graph for the constant expression.
A LiteralProto is returned directly for this request.
Requests one or more device handles from the target. The returned device handles can be used to specify the device on which to execute computations or transfer data.
Creates a channel handle that can be used to transfer data between two computations via a pair of Send and Recv instructions.
Compiles the provided computation into executable. Returns the handle of the executable.
The graph to be compiled.
Options that affect how XLA compiles code to service this request.
The layouts of the input arguments. If not set, the default layout will be used. Although the real arguments are not needed in compilation, the layouts of the arguments can affect the compilation.
The handle to the executable.
Invokes the provided executable with the provided global data passed as immutable arguments. The request contains the handle to the executable. Returns global data output and execution timing.
The shape and layout of the arguments must be the same as the those of the executable's parameters.
Invokes the provided list of computations in parallel with the provided global data for each computation. Returns a list of global data output and execution timing.
Waits until the given execution (aysnchronously launched) is complete, and returns the global data output.
Serialization of BufferAllocation.
Used in:
Assigned represents a single LogicalBuffer that is assigned to this BufferAllocation.
Used in:
Serialization of BufferAssignment.
Used in:
Alias represents a source LogicalBuffer, and the buffer location that aliases it.
Used in:
Handle given to a user to represent a channel between two computations via a Send and Recv instruction pair. Channels are unbuffered, so Send Send instructions will be blocked until the data is transferred.
Used in:
Used in:
,Invalid primitive type to serve as default.
A channel for sending data between devices.
A channel for sending data from the device to the host. Can only be used with a Send operation.
A channel for sending data from the host to the device. Can only be used with a Recv operation.
Used in:
If true, uses the lower triangle of `a`. If false, uses the upper triangle of `a`.
Statistics of a computation.
Used in:
The number of floating point operations in the computation.
The number of transcendental operations (e.g., exp) in the computation.
Used in:
The number of the dimension that represents batch in the input.
The number of the dimension that represents features in the input.
The dimension numbers for the spatial dimensions that the window moves through in the input.
The number of the dimension that represents input features in the convolutional kernel (rhs).
The number of the dimension that represents output features in the convolutional kernel (rhs).
The dimension numbers for the spatial dimensions that the window moves through in the kernel (rhs). window.strides(0) is the stride in the kernel_spatial_dimensions(0) dimension.
The number of the dimension that represents batch in the output.
The number of the dimension that represents features in the output.
The dimension numbers for the spatial dimensions that the window moves through in the output.
Debugging options for XLA. These options may change at any time - there are no guarantees about backward or forward compatibility for these fields.
Used in:
, ,Show addresses of HLO ops in graph dump.
Instrument the computation to collect per-HLO cycle counts.
List of HLO passes to disable/enable. These names must exactly match the pass names as specified by the HloPassInterface::name() method. At least one of xla_disable_hlo_passes and xla_enable_hlo_passes_only must be empty.
Disables all HLO passes. Notes that some passes are necessary for correctness and the invariants that must be satisfied by "fully optimized" HLO are different for different devices and may change over time. The only "guarantee", such as it is, is that if you compile XLA and dump the optimized HLO for some graph, you should be able to run it again on the same device with the same build of XLA.
Numerical optimization level for the XLA compiler backend; the specific interpretation of this value is left to the backends.
Embed the compiler IR as a string in the executable.
Eliminate implicit broadcasts when lowering user computations to HLO instructions; use explicit broadcast instead.
When generating calls to Eigen in the CPU backend, use multi-threaded Eigen mode.
Path to directory with cuda/ptx tools and libraries.
Enable flush-to-zero semantics in the GPU backend.
Disable multi-streaming in the GPU backend.
If true, in LLVM-based backends, emit !alias.scope metadata in generated IR.
If true, in LLVM-based backends, emit !noalias metadata in the generated IR.
If true, in LLVM-based backends, emit !invariant.load metadata in the generated IR.
If true, a set of expensive LLVM optimization passes will not be run.
Options for inserting reduce-precision operations for numerical experimentation. This is a repeated field, as we may want to have multiple passes with different parameters.
This is used by ClientLibraryTestBase::ComputeAndCompare*. If true, the computation will run n! times with all permunations of layouts for the output shape in rank n. For example, with a 3D shape, all permutations of the set {0, 1, 2} are tried.
This is used by ClientLibraryTestBase::ComputeAndCompare*. If true, the computation will run for all permunations of layouts of all input arguments. For example, with 2 input arguments in 2D and 4D shapes, the computation will run 2! * 4! times.
Assign colors based on sharding information when generating the Graphviz HLO graph.
If true, the GPU backend is free to use cudnn for HLO batch normalization ops.
Generate calls to MKL-DNN in the CPU backend.
Maximum kernel unroll factor for the GPU backend.
When true, "unsafe" mathematical optimizations are enabled. These transformations include but are not limited to: - Reducing the precision of operations (e.g. using an approximate sin function, or transforming x/y into x * (1/y)). - Assuming that operations never produce or consume NaN or +/- Inf (this behavior can be adjusted using xla_cpu_fast_math_allow_{nans|infs}). - Assuming that +0 and -0 are indistinguishable.
When xla_cpu_enable_fast_math is true then this controls whether we allow operations to produce NaNs. Ignored when xla_cpu_enable_fast_math is false.
When xla_cpu_enable_fast_math is true then this controls whether we allow operations to produce infinites. Ignored when xla_cpu_enable_fast_math is false.
When xla_cpu_enable_fast_math is true then this controls whether we forbid to use the reciprocal of an argument instead of division. Ignored when xla_cpu_enable_fast_math is false.
When xla_cpu_enable_fast_math is true then this controls whether we forbid to approximate calculations for functions. Ignored when xla_cpu_enable_fast_math is false.
When true we lower the Minimum and Maximum hlos in the GPU backend such that Min(NotNaN, NaN) = Min(NaN, NotNaN) = NotNaN. In other words, if flag this is true we don't propagate NaNs through Min and Max.
Allows xla to increase the output precision of floating point operations.
Crashes the program when any kind of verification fails, instead of just logging the failures. One example is cross checking of convolution results among different algorithms.
Disable GEMM and Convolution auto-tuning.
Force the host platform to pretend that there are these many host "devices". All these devices are backed by the same threadpool. Defaults to 1. Setting this to anything other than 1 can increase overhead from context switching but we let the user override this behavior to help run tests on the host that run models in parallel across multiple devices.
If set to true XLA:GPU invokes `ptxas` with -O0 (default is -O3).
Enable fast math with eigen in the HLO evaluator.
Temporary option to allow support for both the R1 and the scalar index versions of DynamicSlice and DynamicUpdateSlice. Only used for testing.
Option to emit a target-specific marker to indicate the start of a training step. The location of the marker (if any) is determined by the option value.
Directory to dump into.
If specified, will only dump modules which match this regexp.
If this flag is specified, will also HLO before and after passes that match this regular expression. Set to .* to dump before/after all passes.
Specifies the format that HLO is dumped in. Multiple of these may be specified.
Dump HLO graphs as an HTML (DOT -> SVG inlined in HTML)
If true, every time an HLO module is run, we will dump an HloSnapshot (essentially, a serialized module plus its inputs) to the --xla_dump_to directory.
Paths to files with ptx code.
Blacklist for cuDNN convolutions.
Extra options to pass to the compilation backend (e.g. LLVM); specific interpretation of these values is left to the backend.
Used in:
Generate a step marker at the program entry. This handles the case where each step is done by one or multiple program execution(s). Only the first program will be tagged for generating a step marker at the program entry. This is the default.
Generate a step marker at each iteration of the top level while loop, which is assumed to be a training loop.
Generate a step marker at each iteration of the second level while loops, which is assumed to be a training or eval loop.
No step marker generated.
DeviceAssignmentProto is a serialized form of DeviceAssignment class, which represents the device ids assigned to a set of replicated computations. See xla::DeviceAssignment class comment for more details.
Used in:
Each logical computation runs on replica_count physical devices. ComputationDevice represents the device ids assinged to the replicas.
Used in:
Handle given to a user that represents a replicated virtual device. Each replicated device represents N physical devices for execution where N is the number of replicas.
Used in:
, , , , ,The number of model-parallel virtual devices that communicate via XLA Send/Recv instructions.
Used in:
,The dimension numbers that represent the 'lhs' contracting dimensions.
The dimension numbers that represent the 'rhs' contracting dimensions.
The dimension numbers that represent the 'lhs' batch dimensions.
The dimension numbers that represent the 'rhs' batch dimensions.
Used in:
A list of bindings which indicates that the `target_dim_num` in the subshape `target_param_index` of parameter `target_param_num` is a dynamic dimension and its real dynamic size is represented by `dynamic_param_index` in parameter `dynamic_param_num`. As an example, imagine we have a program: ENTRY main { a = f32[] parameter(0) b = f32[10] parameter(1) ROOT root = (f32[], f32[10]) tuple(%a, %b) } Let's say 'b' (param index 1) is a dynamic shape whose input has an upperbound of 10 and real size is determined at runtime.'a' represents the real size of b's first dimension. In this case, the fields are set in the following way: dynamic_param_num = 1 dynamic_param_index = {} target_param_num = 0 target_param_index = {} target_param_dim = 0
Used in:
TODO(b/118493728): Remove this and ExecuteGraphParallelRequest and replace the uses with calls to Compile and Execute.
Used in:
Options that affect how XLA compiles and runs code to service this request.
Used as response type in: XlaService.Execute
Used as field type in:
Handle given to a user that represents an execution that the user launched asynchronously on the device.
Used in:
, ,These settings control how XLA compiles and/or runs code. Not all settings will have an effect on every platform. When adding new fields, keep in mind that boolean fields default to false.
Used in:
,This optional field's layout is used as a hint when storing the output of this computation. Subsequent transfers of this output array to the client may be faster when using this layout. We use a Shape here to accommodate computations that return a tuple.
Used to seed random-number generators used in this computation. If this is 0, we generate a seed ourselves. TODO(b/32083678): Changing the seed unnecessarily forces a recompilation.
This optional field specifies a particular set of devices to run the computation on. The computation will be partitioned across these devices. If not provided, the default device will be chosen.
Number of replicas of the computation to run. If zero, uses the default number of replicas for the XLA service.
This optional field specifies the device assignment if known at compile time.
Profile data from the execution of a computation.
Used in:
,Whether the executable was read from the compilation cache.
The time in milliseconds spent to compile the computation. This only set if the executable was not read from the compilation cache (compilation_cache_hit == false).
The number of cycles spent for the computation. This does not include the time taken for the data transfers between the host and the device. This is a target-dependent field and only used for debugging purposes.
The time in nanoseconds spent for the computation, without data transfer.
The time in nanoseconds spent for the entire computation, including the result data transfer time. Current implementation does not spend any cycles for the input data transfer since the memory is initialized with the proper values before the execution.
The size of the binary code in the executable.
Whether this profile was drawn from a cache of profiles instead of from execution on the hardware.
Used in:
Forward FFT; complex in, complex out.
Inverse FFT; complex in, complex out.
Forward real FFT; real in, fft_length / 2 + 1 complex out
Inverse real FFT; fft_length / 2 + 1 complex in,
A format specifies the method used by a layout to store an array in memory.
Used in:
TODO(b/120869032): Rename this to FORMAT_NONE or something else which better corresponds to its meaning.
The default layout, with exactly one storage location per element.
A sparsely encoded layout, providing only the index/value pairs of non-zero elements.
Describes the dimension numbers for a gather operation. See https://www.tensorflow.org/performance/xla/operation_semantics#gather for more details.
Used in:
"Window indices" is a term for a set of indices that index into the interior of a dynamic-slice from the input tensor, the starting indices for which were computed from output_gather_dims (see the operation semantic for how this is defined) and the start_indices tensor. The window indices for a specific output index Out is computed as: i = 0 for (k : [0, input_tensor_shape.rank)) window_indices[k] = if k in collapsed_slice_dims then 0 else Out[offset_dims[i++]]
This is interpreted as a map from i to start_index_map[i]. It transforms the gather index looked up from the start_indices tensor into the starting index in the input space.
The dimension in the start_indices input that contains the starting indices.
Handle given to a user that represents a globally accessible allocation. Contrast this against a ComputationDataHandle, which is not globally accessible, since it only exists within a specific computation.
Used in:
, , , , , , , , , , , ,A trace of a HeapSimulator run.
Used in:
The trace includes a list of events, where each event describes one action performed by the heap simulator.
Used in:
The id of the LogicalBuffer that the event applies to.
The HloInstruction that the simulation was processing that caused this event to occur, identified by its computation and instruction name. E.g. buffers defined by instruction A are allocated when processing A.
The id of the canonical LogicalBuffer that the buffer shares with. Only set for SHARE_WITH events.
Used in:
A memory region was allocated for the buffer.
A memory region was freed for the buffer.
A buffer was shared with another (canonical) buffer. This is similar to ALLOC, except that instead of allocating a new region of memory, the memory region of the canonical buffer is directly re-used. Multiple buffers may share with the same canonical buffer. The lifetime of the canonical buffer is extended to the union of all lifetimes.
Serialization of HloComputation.
Used in:
The array of instructions is always in a valid dependency order, where operands appear before their users.
The id of this computation.
The id of the root of the computation.
Used in:
The following proto describes a pair of aliased an input (described by parameter number and a ShapeIndex of the parameter) and an output (described by a ShapeIndex of the root instruction). For example: entry = { output_shape_index={1}, parameter_number=0, parameter_shape_index={1, 2}, } This entry indicates that the first paremter's {1, 2} element is aliased with the {1} element of the root instruction.
Used in:
ShapeIndex of the root hlo.
Number of the parameter in entry computation.
ShapeIndex of the parameter instruction.
The kind of alias to be setup.
Used in:
Define a UNDEFINED_ALIAS equal to zero to get around the default-0 proto3 behavior and missing has_*() APIs.
An alias setup by the user as must alias. A use setting USER_ALIAS is expecting the designed output to be dropped over the given input parameter number+index.
An alias setup by the compiler as part of its optimizations.
Serialization of HloInstruction. Next ID: 68
Used in:
,Literal, only present for kConstant.
Parameter number is only present for kParameter.
Fusion state, only present for kFusion.
Index for kGetTupleElement.
Dimensions present for some operations that require reshaping or broadcasting, including Reshape, Reduce, ReduceWindow, and Reverse.
Describes the window in a windowed operation such as convolution.
Describes the dimension numbers used for a convolution.
The number of feature groups. Used for a convolution. Must be a divisor of the input feature dimension and output feature dimension. If not specified, it will use a default value of 1.
The bit sizes for a reduce-precision operation.
Describes the [start, start + size) range size for a dynamic slice ('start' is specified dynamically in the second operand of the operation).
The padding configuration that describes the edge padding and interior padding of this pad instruction. Only set for pad instructions.
Outfeed configuration information, only present for kOutfeed.
The distribution requested for random number generation. Only present for kRng.
A small float number added to the variance to avoid divide-by-zero error. Only present for kBatchNormTraining.
An integer value representing the index of the feature dimension. Only present for kBatchNormTraining.
Represents a unique identifier for each Send/Recv instruction pair or optionally for collective instructions (AllReduce, CollectivePermute, AllToAll). Non-positive channel_id is equivalent to no channel id.
The string representation of the infeed configuration.
Name of a external target (eg, global symbol) to call, only present for kCustomCall.
Shape of outfeed request.
Describes the dimension numbers used for a dot operation
FFT type (FFT, IFFT, etc).
FFT length.
Comparison direction only used for kCompare.
Gather dimension numbers.
Compute Host.
The id of this instruction.
Backend configuration for the instruction. Has backend-specific meaning.
Cross replica op fields.
Deprecated, but keeping it for backward compatibility. Use channel_id. Non-positive all_reduce_id is equivalent to no all_reduce_id.
Whether this Send/Recv instruction transfers data to/from the host. Only present for Send and Recv instructions and their SendDone and RecvDone partners.
Whether this Sort instruction should be stable.
Precision configuration for the instruction. Has backend-specific meaning.
Collective permute field.
Sharding for kDomain instructions.
For custom call this indicates that the layouts are constrained. If constrain_layout is true then the 'shape' field must contain a layout, and 'operand_shapes_with_layout' must contain a shape with layout for each operand.
Options for TriangularSolve
Options for Cholesky
Describes how parameters behave with regards to replicas.
If set, the given instruction is run in parallel on e.g. multiple CPU cores. The outermost dimension gets split up into outer_dimension_partitions[0] pieces, the next-outermost dim gets split into outer_dimension_partitions[1] pieces, etc. It's illegal to partition a dimension into more shards than there are elements in that dimension.
Whether the kCustomCall instruction has side-effects, only present for kCustomCall.
The delta value for kRngGetAndUpdateState.
Specifies if the gather/scatter indices are guaranteed to be sorted by the caller.
Describes the [begin, end) index range and stride for slices.
Used in:
An abstraction representing a set of HLO module built to run concurrently across different devices.
Serialization of HloModule.
Used in:
, , , , ,The array of computations is always in a valid dependency order, where callees appear before their callers.
The host program shape (with layout) of the entry computation.
The id of this module.
The schedule for this module.
Describes alias information between inputs and outputs.
Describes how to pretty-print a profile counter array gathered for a specific HloModule.
HloComputationInfos for every HloComputation in the HloModule.
The size of the profile counters array we will pretty-print.
Maps extra metric name to the index into the profile counters array.
Name of the entry computation.
Pretty-printer information about an HloComputation.
Used in:
The index into the profile counters array for the HloComputation corresponding to this HloComputationInfo.
HloInstructionInfos for every HloInstruction in the HloComputation for corresponding to this HloComputattionInfo.
Pretty-printer information about an HloInstruction.
Used in:
Metrics computed by HloCostAnalysis.
The index into the profile counters array for the HloInstruction corresponding to this HloInstructionInfo.
Grouping message that contains all of the information above.
Used in:
,Options for the HLO insert-reduce-precision-operations pass.
Used in:
Exponent and mantissa bit counts for the reduced precision.
Operations matching these opcodes should be suffixed with reduce-precision operations.
Operations with names containing these substrings should be suffixed with reduce-precision operations.
Where and when the reduce-precision operations will be added.
Used in:
Add reduce-precision operations to the inputs of selected instructions. This is done before any optimization occurs.
Add reduce-precision operations to the outputs of selected instructions. This is done before any optimization occurs.
After operation-fusion occurs, add reduce-precision operations to the outputs of any selected instructions that have not been fused into fusion instructions.
After operation-fusion occurs, add reduce-precision operations to the outputs of any fusion instructions that contain operations matching the selection criteria.
After operation-fusion occurs, add reduce-precision operations to the outputs of any fusion instructions that contain operations matching the selection criteria.
Serialization of an HLO schedule. An HLO schedule contains a total order of instructions for each non-fusion computation in the module.
Used in:
Map from computation id to sequence.
Used in:
Encapsulates HloProto together with the arguments, result, and execution_platform. This message is used for purposes such as analysis/replay/file-storage.
Used in:
The hlo graph.
The arguments passed to the graph.
The result of the graph.
The name of the platform used to run the graph.
A layout describes how the array is placed in (1D) memory space. This includes the minor-to-major ordering of dimensions within a shape. Clients must specify the layouts of input Literals to the computation. Layouts specified in interior operations which take Shapes (for example, Convert) are ignored. See the XLA documentation for more information on shapes and layouts. LINT.IfChange
Used in:
,The method used to store the data in memory. The format determines which of the other fields are used by the layout.
Sequence of dimension numbers, from minor (fastest varying index) to major (slowest varying index). This field is required.
The maximum number of elements that can be stored for SPARSE formats. This can be used to determine the maximum size in bytes of arrays stored in memory. This field must be unset unless the format is SPARSE.
A sequence of tiles, starting from the tile that's applied first to the Shape. TODO(b/119839262): implement tiling in each backend or add Unimplemented error.
Bit size of each element. If the size is bigger than what the element type requires, the value is stored in the least significant bits and the additional most significant bits are filled with 0's. TODO(b/119839262): implement in each backend or add Unimplemented error.
Memory space where this array resides. The integer field is interpreted in a backend-specific manner.
Literals are used when the server and client need to exchange materialized data / results. Literals are also used to describe constants used in computations. Transfers to/from the client are encoded in literal form, and the structure of the repeated fields is implied by the shape.
Used in:
, , , , , , ,Stored as interleaved real, imag floats.
Stored as interleaved real, imag doubles.
The F16s, BF16s, U16s and S16s are encoded in little endian byte order
Next = 19
Serialization of LogicalBuffer.
Used in:
The location where the buffer is defined.
Location represents an instruction and its shape index, which uniquely identifies a point where a buffer is needed.
Used in:
,NOTE: module_name isn't necessary, since all LogicalBuffers are associated with a single HloModule.
Symbolization metadata for HLO Instructions. This metadata is used for debugging XLA code generation, as well as performance profiling of XLA-generated executables.
Used in:
The framework op name that generated this XLA op. Frameworks that build on top of XLA should mirror the names of their ops back to users by specifying the op_type. In this way, even if the framework's "ops" are implemented as multiple XLA HLO Ops, they can be grouped appropriately. (e.g. if a SoftMax layer is emitted into XLA as multiple ops, then each op should have the op_type be "SoftMax".)
The user-specified name of the op. This name is often unique within a computation. Note: some frameworks add auto-generated names if the user does not provide one.
Indicate a file and line that this op is associated to in a user's program. e.g. it could be the file and line of user code that generated the op.
Used in:
The shape of the sharded tile.
The shape of the tile assignment tensor - this must be the same rank as tile_shape and the product of its dimensions must equal tile_assignment_devices.size().
Flattened list of device IDs. The order of flattening is the same as used by IndexUtil::MultiToLinearIndex(tile_assignment_shape).
If type == TUPLE, the sub-shardings, one per leaf node in the tuple shape, in pre-order. The tuple shape could be nested; here we store just a flattened list of all leaves in the tuple shape. Note that the tuple shape is not stored here; shardings do not store the shapes to which they are applied, this is inferred from the instruction this sharding gets attached to.
Used in:
This sharding is replicated across all devices (implies maximal, all other fields are unused).
This sharding is maximal - one device runs the entire operation.
This sharding is a tuple - only the tuple_shardings field is valid.
None of the above; tile_shape and tile_assignment are both used.
Describes the padding configuration for Pad operation. The padding amount on both edges as well as between the elements are specified for each dimension.
Used in:
The padding configuration for all dimensions.
Describes the padding configuration for a dimension.
Used in:
Padding amount on the low-end (next to the index 0). May be negative.
Padding amount on the high-end (next to the highest index). May be negative.
Padding amount between the elements. May not be negative.
Describes whether all data-parallelism replicas will receive the same parameter data at each buffer.
Used in:
A list of boolean values for the flattened leaf buffers. Each value indicates whether the corresponding leaf buffer is replicated. If this field is empty, it means no buffer is replicated. Otherwise, the number of elements in this field must match the number of leaf buffers in the HLO instruction's shape.
Used to indicate the precision configuration. It has backend specific meaning.
Used in:
Used in:
Primitive types are the individual values that can be held in rectangular multidimensional arrays. A description of the rectangular multidimensional array dimensions / primitive type is given by Shape, below.
Used in:
Invalid primitive type to serve as default.
Predicates are two-state booleans.
Signed integral values of fixed width.
Unsigned integral values of fixed width.
Floating-point values of fixed width. Note: if f16s are not natively supported on the device, they will be converted to f16 from f32 at arbirary points in the computation.
Truncated 16 bit floating-point format. This is similar to IEEE's 16 bit floating-point format, but uses 1 bit for the sign, 8 bits for the exponent and 7 bits for the mantissa.
Complex values of fixed width.
Paired F32 (real, imag), as in std::complex<float>.
Paired F64 (real, imag), as in std::complex<double>.
A tuple is a polymorphic sequence; e.g. a shape that holds different sub-shapes. They are used for things like returning multiple values from a computation; e.g. a computation that returns weights and biases may have a signature that results in a tuple like (f32[784x2000], f32[2000]) If a shape proto has the tuple element type, it may not have any entries in the dimensions field.
An opaque type used for passing context-specific data to a custom operation. Shapes of this primitive type will have empty dimensions and tuple_shapes fields. (OPAQUE would be a better name for this identifier, but that conflicts with a macro defined in windows.h.)
A token type threaded between side-effecting operations. Shapes of this primitive type will have empty dimensions and tuple_shapes fields.
Shape of the parameters and output of a computation (like a traditional function signature).
Used in:
, ,Used in:
Creates a uniform-distribution-generated random number on the semi-open interval [parameter[0], parameter[1]).
Creates a normal-distribution-generated random number with mean parameter[0] and standard deviation parameter[1].
Describes the replica groups in a cross replica op (e.g., all-reduce and all-to-all).
Used in:
The ids of the replicas that belongs to the same group. The ordering of the ids matters in some ops (e.g., all-to-all).
Describes the dimension numbers for a scatter operation. All the fields are similar to the corresponding fields in GatherDimensionNumbers. Differences are noted below.
Used in:
The set of dimensions in the updates shape that are window dimensions.
The set of window dimensions that must be inserted into the updates shape.
A shape describes the number of dimensions in the array, the size of each dimension, and the primitive component type. Tuples are a special case in that they have rank zero and have tuple_shapes defined. See the XLA documentation for more information on shapes and layouts. LINT.IfChange
Used in:
, , , , , , , , , , ,The element type for this shape.
The size (number of elements) for each dimension, or an upper bound on the size if the dimension is dynamic. In XLA, dimensions are numbered from 0 to N-1 for an N-dimensional array. The first element of 'dimensions' is the size of dimension 0, the second element is the size of dimension 1, and so forth. Empty list indicates a scalar. If the respective element in 'is_dimension_dynamic' is true then the value in this field represents an upper bound on the size of the dimension.
For tuples only, the shapes of constituent shapes in the tuple sequence.
The layout used to back this shape.
For arrays, this indicates whether or not each dimension is dynamically-sized. The number of elements in this repeated field should be zero (indicating that no dimensions are dynamic) or equal to the number of elements in the 'dimensions' field.
Describes the source target pair in the collective permute op.
Used in:
Describes a tile used in tiling-based layout. Refer to g3doc/third_party/tensorflow/compiler/xla/g3doc/layout_with_tiling.md for details about tiling-based layout.
Used in:
Number of elements in each dimension of the tile. It's ordered from the most major dimension of the tile to the most minor dimension of the tile. The dimensions correspond to a suffix of the dimensions of the shape being tiled.
Used in:
If true, solves ax = b. If false, solves xa = b.
If true, 'a' is lower triangular. If false, 'a' is upper triangular.
If true, the diagonal elements of 'a' are assumed to be 1 and not accessed.
Should we transpose or use the adjoint of 'a'?
Used in:
Don't transpose 'a'.
Transpose 'a'.
Complex conjugate and transpose 'a'.
A backend-config for kWhile loops that stores the loop's trip count, if it is known. This is useful for backends that can implement a `for i in 0..N` loop more efficiently than a `while` loop. For example, on GPUs, we can implement a `for i in 0..N` loop by enqueueing the kernels for the loop body N times, whereas implementing a `while` loop requires a host-device sync on each iteration.
This indirection lets us distinguish between known-trip-count == 0 and unknown-trip-count.
Used in:
Describes the windowing in an operation such as convolution. The window is moved across a base area and for each position of the window a computation is performed. The field below describes the window and the movement of the window across a base area.
Used in:
Used in:
The size of the window in this dimension. For a rectangle, this would be the width or height.
The stride at which the window moves across the base area in this dimension. In other words, this is the spacing between different positions of the window in this dimension.
If positive, means the amount of padding to add to the base area at the low end of this dimension; if negative, its negative means the number of elements removed from the low end of this dimension. For example, in the horizontal dimension of a rectangle, this would be the number of padding values to pad on the left, given that indices increase when going right. The actual padding value depends upon the context. Convolution pads with zeros. ReduceWindow and SelectAndScatter pads with the reduce function's init value.
As padding_low, but on the high end of this dimension. For example, in the horizontal dimension of a rectangle, this would be the number of values to pad on the right, given that indices increase when going right.
Dilation factor of the sliding window in this dimension. A dilation factor of 1 means no dilation. window_dilation - 1 no-op entries ("holes") are implicitly placed between each kernel element. This value may not be less than 1. See documentation for convolution.
Dilation factor of the base area in this dimension. A dilation factor of 1 means no dilation. base_dilation - 1 no-op entries ("holes") are implicitly placed between each base area element. This value may not be less than 1. See documentation for convolution.
Window reversal means that this dimension was logically reversed before the operation.