package tensorflow.tpu

Mouse Melon logoGet desktop application:
View/edit binary Protocol Buffers messages

service TpuCompilationCacheServiceExternal

tpu_compilation_cache.proto:37

message AdadeltaParameters

optimization_parameters.proto:262

https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adadelta https://github.com/tensorflow/tensorflow/blob/6b6471f3ffb7f1fefe42d814aa5fb9ab7a535b58/tensorflow/core/kernels/training_ops.cc#L933

Used in: OptimizationParameters

message AdagradMomentumParameters

optimization_parameters.proto:101

This optimizer combines the Adagrad and Momentum update rules. accum(new) = beta2 == 1.0 ? accum(old) + grad^2 : beta2 * accum(old) + (1 - beta2) * grad^2 accum_with_exponent = (accum(new) + epsilon)^(-1.0 / exponent) mom_accum(new) = momentum * mom_accum(old) + accum_with_exponent update = use_nesterov ? momentum * mom_accum(new) + accum_with_exponent : mom_accum(new) var(new) = var(old) - lr * grad * update Algorithm described in https://arxiv.org/abs/2002.11803.

Used in: OptimizationParameters

message AdagradParameters

optimization_parameters.proto:84

https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adagrad https://github.com/tensorflow/tensorflow/blob/6b6471f3ffb7f1fefe42d814aa5fb9ab7a535b58/tensorflow/core/kernels/training_ops.cc#L1634

Used in: OptimizationParameters

(message has no fields)

message AdamParameters

optimization_parameters.proto:193

The Adam optimizer does not implement hyper-parameter update due to hardware limitations; use the dynamic learning rate feature instead, setting the learning rate to: user learning_rate * sqrt(1 - beta2^t) / (1 - beta1^t) Here, t is the current timestep. https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam https://github.com/tensorflow/tensorflow/blob/ab51450c817674c8ff08a7ae4f8ac50cdc4bed8b/tensorflow/python/training/adam.py#L32 Note that the code by default implements the lazy version of Adam (https://www.tensorflow.org/api_docs/python/tf/contrib/opt/LazyAdamOptimizer) unless the use_non_lazy_adam parameter is set, in which case it implements the normal version of Adam that updates all parameters in the embedding table, even for entries that are not used in the current minibatch (https://www.tensorflow.org/api_docs/python/tf/contrib/opt/AdamOptimizer). If use_non_lazy_adam is enabled, gradient accumulation is also required to be enabled in order to get correct results; a warning will be printed otherwise (which may change to an error in the future). If use_sum_inside_sqrt is set, the Adam variable update formula will be changed from m / (sqrt(v) + epsilon) to m / sqrt(v + epsilon**2); this option improves the performance of TPU training and is not expected to harm model quality.

Used in: OptimizationParameters

message AssignParameters

optimization_parameters.proto:404

Optimizer that just sets the variable to the value of the gradient. To be correct, this requires either gradient accumulation (to sum the values of a computed expression across the samples) or to deduplicate IDs within a single host (to assign the value from an arbitrary sample).

Used in: OptimizationParameters

(message has no fields)

message BoundedAdagradParameters

optimization_parameters.proto:119

Algorithm in http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf.

Used in: OptimizationParameters

message CenteredRmsPropParameters

optimization_parameters.proto:230

https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/RMSprop https://github.com/tensorflow/tensorflow/blob/6b6471f3ffb7f1fefe42d814aa5fb9ab7a535b58/tensorflow/core/kernels/training_ops.cc#L4358

Used in: OptimizationParameters

message ClippingLimits

optimization_parameters.proto:8

Used in: OptimizationParameters, SimulatedQuantization

enum CompilationCacheFetchTarget

tpu_compilation_cache_common.proto:20

Target type for compilation cache fetch operation.

Used in: GetTpuProgramRequest

message CompilationResultProto

compilation_result.proto:13

Describes the result of a TPU compilation. This is also used as TPU compilation result status payload. URI: "type.googleapis.com/tensorflow.tpu.CompilationResultProto"

enum CompilationResultProto.ErrorCode

compilation_result.proto:21

Used in: CompilationResultProto

message DynamicLearningRate

optimization_parameters.proto:41

Dynamic learning rate specification in the TPUEmbeddingConfiguration. The actual learning rates are provided as a scalar input list to the SendTPUEmbeddingGradients Op indexed by their tag specified through the following proto.

Used in: LearningRate

message FrequencyEstimatorParameters

optimization_parameters.proto:360

Estimator for the frequency of updates to a lookup table. It maintains an array (tf.Variable) D, where each element records the average number of global steps between two consecutive batches that hit the corresponding bucket. Once an item with bucket id i is sampled, D[i] is updated by: D[i] <- D[i] * (1 - tau) + delta[i] * tau, where tau is a learning rate between 0 and 1 (exclusive), and delta[i] = current global step - last step i is sampled. The estimated frequency (sampling rate in a batch) is thus 1 / D[i]. Elements in D are initialized with a large value max_delta. delta[i] will also be capped by this value. The exact sequence of operations used in the optimizer is shown below. last_hit_step[i] is a tf.Variable that holds the last global step at which i was sampled. delta = global_step - last_hit_step[i] clipped_delta = min(delta, params.max_delta) is_outlier = (delta >= params.outlier_threshold * D[i]) D[i] <- is_outlier ? clipped_delta : D[i] * (1 - params.tau) + clipped_delta * params.tau last_hit_step[i] <- global_step

Used in: OptimizationParameters

message FtrlParameters

optimization_parameters.proto:155

https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Ftrl https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41159.pdf https://github.com/tensorflow/tensorflow/blob/6b6471f3ffb7f1fefe42d814aa5fb9ab7a535b58/tensorflow/core/kernels/training_ops.cc#L2646 The hyperparameters for FTRL are the same as for the Keras implementation, with some additions. The "beta" parameter matches the behavior described in the second link above; "beta" / (2 * learning rate) should be added to "l2" to get equivalent behavior in the other TensorFlow implementations of this optimizer. When the multiply_linear_by_lr field is set to true, a modified formula is used for FTRL that treats the "linear" accumulator as being pre-multiplied by the learning rate (i.e., the accumulator named "linear" actually stores "linear * learning_rate"). Other than checkpoint compatibility, this is mathematically equivalent for a static learning rate; for a dynamic learning rate, it is nearly the same as long as the learning rate does not change quickly. The benefit of setting multiply_linear_by_lr to true is that the modified formula handles zero and near-zero learning rates without producing NaNs, improving flexibility for learning rate ramp-up.

Used in: OptimizationParameters

message GetTpuProgramResponseExternal.Blob

tpu_compilation_cache.proto:24

Used in: GetTpuProgramResponseExternal

message GradientAccumulationStatus

optimization_parameters.proto:410

Status of using gradient accumulation (doing two passes over the input gradients: one to accumulate them into a temporary array and another to apply them using the actual optimization algorithm). The extra message is to wrap the enum for scoping.

(message has no fields)

enum GradientAccumulationStatus.Status

optimization_parameters.proto:412

if UNSPECIFIED (default), gradient accumulation is ENABLED.

Used in: OptimizationParameters

message HotIdReplicationConfiguration

optimization_parameters.proto:477

Configuration proto for hot ID optimization. This is an experimental feature that is currently disabled (by default).

Used in: OptimizationParameters

enum HotIdReplicationConfiguration.Status

optimization_parameters.proto:480

Whether to enable or disable hot ID optimization. If UNSPECIFIED (default), hot ID optimization is DISABLED.

Used in: HotIdReplicationConfiguration

message LearningRate

optimization_parameters.proto:72

Source of learning rate to use.

Used in: OptimizationParameters

message LowDimensionalPackingStatus

optimization_parameters.proto:457

There is one important limitation for this HBM packing though. When only a subset of rows in an 8-float chunk are accessed on a particular step, the adjoining rows in the same chunk are updated with zero gradients on the backward pass even if they are not touched. This is an artifact of the packing implementation. This operation is NOT functionally correct for optimizers where zero gradients change the embeddings/slot-variable values, e.g., momentum-based optimizers. Hence, this HBM packing cannot be enabled for embedding tables with such optimizers. The TPU software automatically recognizes that a zero gradient can modify state and turns off the low dimensional embedding packing in that scenario. However, for optimizers where a zero gradient is a NoOp, such as SGD, Adagrad, and FTRL, this packing optimization can be used. However, there are some important considerations: * Clipping limits: The initial values for such embeddings should fall within the clipping limits specified in the optimization parameters. Otherwise, a zero gradient will cause the embeddings to be clipped. This changes state and hence, is not a NoOp. * FTRL: The embedding vector is computed directly from the values of the accumulator and linear slot variables. Hence, the initial embedding values should match that computed from the initial values of the accumulator and linear slot variables. Note that in nearly all cases, the linear value is initialized to zero; this corresponds to an embedding value of zero. Performance: The TPU has to perform additional work when low dimensional packing is enabled. In certain situations when the vocabulary size is small, it may not make sense to turn on this packing since the total memory usage due to padding is extremely low. Hence, the TPU software automatically turns off the packing optimization in such scenarios.

(message has no fields)

enum LowDimensionalPackingStatus.Status

optimization_parameters.proto:468

if UNSPECIFIED (default), the low dimension packing status is DISABLED. This can change in future. if ENABLED, the low dimension packing is enabled only if the following three additional conditions are true: * The optimizer treats the zero gradient as a NoOp. * The embedding dimension is 1, 2, or 4. * The vocabulary size is large enough to avoid performance issues. if DISABLED, the low dimension packing is always disabled.

Used in: OptimizationParameters

message MdlAdagradLightParameters

optimization_parameters.proto:241

Variant of algorithm in http://proceedings.mlr.press/v44/shamir15.pdf

Used in: OptimizationParameters

message MomentumParameters

optimization_parameters.proto:207

https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/SGD https://github.com/tensorflow/tensorflow/blob/6b6471f3ffb7f1fefe42d814aa5fb9ab7a535b58/tensorflow/core/kernels/training_ops.cc#L3068

Used in: OptimizationParameters

message OnlineYogiParameters

optimization_parameters.proto:291

The online Yogi optimizer does not implement hyper-parameter update; use the dynamic learning rate feature instead, setting the learning rate to: user learning_rate * sqrt(1 - beta2^t) / (1 - beta1^t) Here, t is the current timestep. https://papers.nips.cc/paper/8186-adaptive-methods-for-nonconvex-optimization.pdf plus some extensions based on FTRL. Note that the code by default implements the lazy version of online Yogi.

Used in: OptimizationParameters

message OptimizationParameters

optimization_parameters.proto:488

Used in: TPUEmbeddingConfiguration.TableDescriptor

message PaddingMap

dynamic_padding.proto:9

A mapping between the dynamic shape dimension of an input and the arg that represents the real shape.

Used in: TPUCompileMetadataProto

message ProximalAdagradParameters

optimization_parameters.proto:273

https://www.tensorflow.org/api_docs/python/tf/compat/v1/train/ProximalAdagradOptimizer https://github.com/tensorflow/tensorflow/blob/6b6471f3ffb7f1fefe42d814aa5fb9ab7a535b58/tensorflow/core/kernels/training_ops.cc#L1961

Used in: OptimizationParameters

message ProximalYogiParameters

optimization_parameters.proto:315

The online Yogi optimizer does not implement hyper-parameter update; use the dynamic learning rate feature instead, setting the learning rate to: user learning_rate * sqrt(1 - beta2^t) / (1 - beta1^t) Here, t is the current timestep. https://papers.nips.cc/paper/8186-adaptive-methods-for-nonconvex-optimization.pdf plus some extensions based on FTRL. Note that the code by default implements the lazy version of proximal Yogi.

Used in: OptimizationParameters

message RmsPropParameters

optimization_parameters.proto:218

https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/RMSprop https://github.com/tensorflow/tensorflow/blob/6b6471f3ffb7f1fefe42d814aa5fb9ab7a535b58/tensorflow/core/kernels/training_ops.cc#L4229

Used in: OptimizationParameters

message SimulatedQuantization

optimization_parameters.proto:26

Configuration for simulated quantization; simulated quantization is used to reduce training/serving skew when the serving variables are quantized. The same quantization operations are executed during training to minimize differences with serving. Simulated quantization inserts the following operations on the forward pass after gathering the embedding vector from HBM. The backward pass operations are unchanged. clipped_val = clip(input, clipping_limits) quantum = clipping_limits.range() / (num_buckets - 1) quantized_val = floor((clipped_val - clipping_limits.lower()) / quantum + .5) return quantized_val * quantum + clipping_limits.lower().

Used in: OptimizationParameters

message StateVariableSpecification

optimization_parameters.proto:563

Specification of an optimization algorithm's state variables (both the main value vector and any extra accumulators, etc.). This proto is only used internally by the TPU software and is not exposed directly to the TF model.

message StateVariableSpecification.FillWithConstant

optimization_parameters.proto:576

A state variable that should be filled with a constant and normally hidden from users (used for intermediate gradients being accumulated, for example).

Used in: StateVariableSpecification

message StateVariableSpecification.UserDefined

optimization_parameters.proto:569

A normal state variable that should be saved and restored in checkpoints and used as an input or output to non-debug TensorFlow ops.

Used in: StateVariableSpecification

(message has no fields)

message StochasticGradientDescentParameters

optimization_parameters.proto:136

https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/SGD https://github.com/tensorflow/tensorflow/blob/6b6471f3ffb7f1fefe42d814aa5fb9ab7a535b58/tensorflow/core/kernels/training_ops.cc#L629

Used in: OptimizationParameters

(message has no fields)

message TPUCompileMetadataProto

compile_metadata.proto:16

This is an experimental proto used in the TF/XLA bridge to store metadata to a compile op (e.g. _TPUCompileMlir). TODO(lyandy): Deprecate proto once generic metadata proto is created.

Used in: TpuCompilationRequestProto

message TPUCompileMetadataProto.Arg

compile_metadata.proto:18

Description of the types and shapes of the arguments to a computation.

Used in: TPUCompileMetadataProto

enum TPUCompileMetadataProto.Arg.EnableXlaSharding

compile_metadata.proto:38

Used in: Arg

enum TPUCompileMetadataProto.Arg.Kind

compile_metadata.proto:19

Used in: Arg

message TPUCompileMetadataProto.Retval

compile_metadata.proto:73

Description of the return values from a computation.

Used in: TPUCompileMetadataProto

message TPUCompileOptions

compile_metadata.proto:146

Stable protobuf for TPU compilation options, suitable for persistent storage. This proto needs to be backward compatible under maintenance. TODO(timshen): investigate and migrate other options from TPUCompileMetadataProto.

Used in: TPUCompileMetadataProto

enum TPUCompileOptions.Precision

compile_metadata.proto:147

Used in: TPUCompileOptions

message TPUEmbeddingConfiguration

tpu_embedding_configuration.proto:7

message TPUEmbeddingConfiguration.FeatureDescriptor

tpu_embedding_configuration.proto:111

Description of different input features.

Used in: TPUEmbeddingConfiguration

enum TPUEmbeddingConfiguration.Mode

tpu_embedding_configuration.proto:30

Mode. Should the embedding layer program be run for inference (just forward pass), training (both forward and backward pass) or just the backward_pass.

Used in: TPUEmbeddingConfiguration

enum TPUEmbeddingConfiguration.ShardingStrategy

tpu_embedding_configuration.proto:60

Sharding strategy of the embedding tables among the hosts. If the sharding_strategy is "mod", each id is assigned to host "id % num_hosts". For instance, 13 ids are split across 5 hosts as: [[0, 5, 10], [1, 6, 11], [2, 7, 12], [3, 8], [4, 9]]. If the sharding_strategy is "div", ids are assigned to hosts in a contiguous manner. In this case, 13 ids are split across 5 hosts as: [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10], [11, 12]]. In both the strategies, if the id space does not evenly divide the number of hosts, each of the first "table_descriptor.vocabulary_size % num_hosts" hosts will be assigned one more id. This partitioning strategy exactly follows that in the embedding_lookup TensorFlow function at tensorflow/python/ops/embedding_ops.py.

Used in: TPUEmbeddingConfiguration

message TPUEmbeddingConfiguration.SpmdSharding

tpu_embedding_configuration.proto:136

SPMD (Single Program Multiple Data) sharding configuration for TPUEmbedding. When model parallelism is used on the TensorCore, the number of cores per replica must be passed to TPUEmbedding so that the right shapes can be computed in the TF/XLA bridge.

Used in: TPUEmbeddingConfiguration

message TPUEmbeddingConfiguration.TableDescriptor

tpu_embedding_configuration.proto:9

Description of the various embedding tables.

Used in: TPUEmbeddingConfiguration

message TPUEmbeddingError

tpu_embedding_configuration.proto:152

A placeholder message that is used to define a unique Status payload URL for TPU embedding errors.

(message has no fields)

message TPUHardwareFeature

topology.proto:8

Describes features of a tpu.

Used in: TopologyProto

enum TPUHardwareFeature.EmbeddingFeature

topology.proto:10

Embedding feature of a tpu.

Used in: TPUHardwareFeature

message TopologyProto

topology.proto:26

Describes the geometry of a TPU mesh.

message TpuCompilationRequestProto

tpu_compile.proto:27

TPU compilation request for compiling computations into XLA HLO IR and build TPU programs.

message TpuCompilationUidAndIndex

tpu_compilation_cache_common.proto:27

Used in: GetTpuProgramRequest

message UserDefinedProgramParameters

optimization_parameters.proto:395

A user-defined optimizer. The contained HLO program must take the following arguments in the following order: 1. gradients 2. table weights 3. slot variables 4. an optional scalar input that is passed in via the dynamic learning rate mechanism. It must return/end in a tuple op that contains the following values in the following order: 1. new table values 2. new slot variable value The program must have shape (1,1) with dtype float32 throughout and only use HLO that operate elementwise (e.g., no reduce, no variables, no control flow and no broadcasting outside of the single scalar input). The HLO program should be written as if it were a dense update. It will be called on each row that needs an update and will applied elementwise.

Used in: OptimizationParameters