package tensorflow.tpu

Mouse Melon logoGet desktop application:
View/edit binary Protocol Buffers messages

service TpuCompilationCacheServiceExternal

tpu_compilation_cache.proto:37

message AdadeltaParameters

optimization_parameters.proto:207

https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adadelta https://github.com/tensorflow/tensorflow/blob/6b6471f3ffb7f1fefe42d814aa5fb9ab7a535b58/tensorflow/core/kernels/training_ops.cc#L933

Used in: OptimizationParameters

message AdagradParameters

optimization_parameters.proto:60

https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adagrad https://github.com/tensorflow/tensorflow/blob/6b6471f3ffb7f1fefe42d814aa5fb9ab7a535b58/tensorflow/core/kernels/training_ops.cc#L1634

Used in: OptimizationParameters

(message has no fields)

message AdamParameters

optimization_parameters.proto:138

The Adam optimizer does not implement hyper-parameter update due to hardware limitations; use the dynamic learning rate feature instead, setting the learning rate to: user learning_rate * sqrt(1 - beta2^t) / (1 - beta1^t) Here, t is the current timestep. https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Adam https://github.com/tensorflow/tensorflow/blob/ab51450c817674c8ff08a7ae4f8ac50cdc4bed8b/tensorflow/python/training/adam.py#L32 Note that the code by default implements the lazy version of Adam (https://www.tensorflow.org/api_docs/python/tf/contrib/opt/LazyAdamOptimizer) unless the use_non_lazy_adam parameter is set, in which case it implements the normal version of Adam that updates all parameters in the embedding table, even for entries that are not used in the current minibatch (https://www.tensorflow.org/api_docs/python/tf/contrib/opt/AdamOptimizer). If use_non_lazy_adam is enabled, gradient accumulation is also required to be enabled in order to get correct results; a warning will be printed otherwise (which may change to an error in the future). If use_sum_inside_sqrt is set, the Adam variable update formula will be changed from m / (sqrt(v) + epsilon) to m / sqrt(v + epsilon**2); this option improves the performance of TPU training and is not expected to harm model quality.

Used in: OptimizationParameters

message BoundedAdagradParameters

optimization_parameters.proto:67

Algorithm in http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf.

Used in: OptimizationParameters

message CenteredRmsPropParameters

optimization_parameters.proto:175

https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/RMSprop https://github.com/tensorflow/tensorflow/blob/6b6471f3ffb7f1fefe42d814aa5fb9ab7a535b58/tensorflow/core/kernels/training_ops.cc#L4358

Used in: OptimizationParameters

message ClippingLimits

optimization_parameters.proto:8

Used in: OptimizationParameters

enum CompilationCacheFetchTarget

tpu_compilation_cache_common.proto:20

Target type for compilation cache fetch operation.

Used in: GetTpuProgramRequest

message CompilationResultProto

compilation_result.proto:11

Describes the result of a TPU compilation.

message DynamicLearningRate

optimization_parameters.proto:17

Dynamic learning rate specification in the TPUEmbeddingConfiguration. The actual learning rates are provided as a scalar input list to the SendTPUEmbeddingGradients Op indexed by their tag specified through the following proto.

Used in: LearningRate

message FrequencyEstimatorParameters

optimization_parameters.proto:305

Estimator for the frequency of updates to a lookup table. It maintains an array (tf.Variable) D, where each element records the average number of global steps between two consecutive batches that hit the corresponding bucket. Once an item with bucket id i is sampled, D[i] is updated by: D[i] <- D[i] * (1 - tau) + delta[i] * tau, where tau is a learning rate between 0 and 1 (exclusive), and delta[i] = current global step - last step i is sampled. The estimated frequency (sampling rate in a batch) is thus 1 / D[i]. Elements in D are initialized with a large value max_delta. delta[i] will also be capped by this value. The exact sequence of operations used in the optimizer is shown below. last_hit_step[i] is a tf.Variable that holds the last global step at which i was sampled. delta = global_step - last_hit_step[i] clipped_delta = min(delta, params.max_delta) is_outlier = (delta >= params.outlier_threshold * D[i]) D[i] <- is_outlier ? clipped_delta : D[i] * (1 - params.tau) + clipped_delta * params.tau last_hit_step[i] <- global_step

Used in: OptimizationParameters

message FtrlParameters

optimization_parameters.proto:105

https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/Ftrl https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41159.pdf https://github.com/tensorflow/tensorflow/blob/6b6471f3ffb7f1fefe42d814aa5fb9ab7a535b58/tensorflow/core/kernels/training_ops.cc#L2646 The hyperparameters for FTRL are the same as for the Keras implementation, with some additions. The "beta" parameter matches the behavior described in the second link above; "beta" / (2 * learning rate) should be added to "l2" to get equivalent behavior in the other TensorFlow implementations of this optimizer. When the multiply_linear_by_lr field is set to true, a modified formula is used for FTRL that treats the "linear" accumulator as being pre-multiplied by the learning rate (i.e., the accumulator named "linear" actually stores "linear * learning_rate"). Other than checkpoint compatibility, this is mathematically equivalent for a static learning rate; for a dynamic learning rate, it is nearly the same as long as the learning rate does not change quickly. The benefit of setting multiply_linear_by_lr to true is that the modified formula handles zero and near-zero learning rates without producing NaNs, improving flexibility for learning rate ramp-up. The allow_zero_accumulator parameter changes some internal formulas to allow zero and near-zero accumulator values at the cost of some performance; this only needs to be set if you are using an initial accumulator value of zero, which is uncommon.

Used in: OptimizationParameters

message GetTpuProgramResponseExternal.Blob

tpu_compilation_cache.proto:24

Used in: GetTpuProgramResponseExternal

message GradientAccumulationStatus

optimization_parameters.proto:353

Status of using gradient accumulation (doing two passes over the input gradients: one to accumulate them into a temporary array and another to apply them using the actual optimization algorithm). The extra message is to wrap the enum for scoping.

(message has no fields)

enum GradientAccumulationStatus.Status

optimization_parameters.proto:355

if UNSPECIFIED (default), gradient accumulation is ENABLED.

Used in: OptimizationParameters

message HotIdReplicationConfiguration

optimization_parameters.proto:364

Configuration proto for hot ID optimization. This is an experimental feature that is currently disabled (by default).

Used in: OptimizationParameters

enum HotIdReplicationConfiguration.Status

optimization_parameters.proto:367

Whether to enable or disable hot ID optimization. If UNSPECIFIED (default), hot ID optimization is DISABLED.

Used in: HotIdReplicationConfiguration

message LearningRate

optimization_parameters.proto:48

Source of learning rate to use.

Used in: OptimizationParameters

message MdlAdagradLightParameters

optimization_parameters.proto:186

Variant of algorithm in http://proceedings.mlr.press/v44/shamir15.pdf

Used in: OptimizationParameters

message MomentumParameters

optimization_parameters.proto:152

https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/SGD https://github.com/tensorflow/tensorflow/blob/6b6471f3ffb7f1fefe42d814aa5fb9ab7a535b58/tensorflow/core/kernels/training_ops.cc#L3068

Used in: OptimizationParameters

message OnlineYogiParameters

optimization_parameters.proto:236

The online Yogi optimizer does not implement hyper-parameter update; use the dynamic learning rate feature instead, setting the learning rate to: user learning_rate * sqrt(1 - beta2^t) / (1 - beta1^t) Here, t is the current timestep. https://papers.nips.cc/paper/8186-adaptive-methods-for-nonconvex-optimization.pdf plus some extensions based on FTRL. Note that the code by default implements the lazy version of online Yogi.

Used in: OptimizationParameters

message OptimizationParameters

optimization_parameters.proto:375

Used in: TPUEmbeddingConfiguration.TableDescriptor

message PaddingMap

dynamic_padding.proto:9

A mapping between the dynamic shape dimension of an input and the arg that represents the real shape.

Used in: TPUCompileMetadataProto

message ProximalAdagradParameters

optimization_parameters.proto:218

https://www.tensorflow.org/api_docs/python/tf/compat/v1/train/ProximalAdagradOptimizer https://github.com/tensorflow/tensorflow/blob/6b6471f3ffb7f1fefe42d814aa5fb9ab7a535b58/tensorflow/core/kernels/training_ops.cc#L1961

Used in: OptimizationParameters

message ProximalYogiParameters

optimization_parameters.proto:260

The online Yogi optimizer does not implement hyper-parameter update; use the dynamic learning rate feature instead, setting the learning rate to: user learning_rate * sqrt(1 - beta2^t) / (1 - beta1^t) Here, t is the current timestep. https://papers.nips.cc/paper/8186-adaptive-methods-for-nonconvex-optimization.pdf plus some extensions based on FTRL. Note that the code by default implements the lazy version of proximal Yogi.

Used in: OptimizationParameters

message RmsPropParameters

optimization_parameters.proto:163

https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/RMSprop https://github.com/tensorflow/tensorflow/blob/6b6471f3ffb7f1fefe42d814aa5fb9ab7a535b58/tensorflow/core/kernels/training_ops.cc#L4229

Used in: OptimizationParameters

message StateVariableSpecification

optimization_parameters.proto:436

Specification of an optimization algorithm's state variables (both the main value vector and any extra accumulators, etc.). This proto is only used internally by the TPU software and is not exposed directly to the TF model.

message StateVariableSpecification.FillWithConstant

optimization_parameters.proto:462

A state variable that should be filled with a constant and normally hidden from users (used for intermediate gradients being accumulated, for example).

Used in: StateVariableSpecification

message StateVariableSpecification.UserDefined

optimization_parameters.proto:442

A normal state variable that should be saved and restored in checkpoints and used as an input or output to non-debug TensorFlow ops.

Used in: StateVariableSpecification

message StochasticGradientDescentParameters

optimization_parameters.proto:82

https://www.tensorflow.org/api_docs/python/tf/keras/optimizers/SGD https://github.com/tensorflow/tensorflow/blob/6b6471f3ffb7f1fefe42d814aa5fb9ab7a535b58/tensorflow/core/kernels/training_ops.cc#L629

Used in: OptimizationParameters

(message has no fields)

message TPUCompileMetadataProto

compile_metadata.proto:16

This is an experimental proto used in the TF/XLA bridge to store metadata to a compile op (e.g. _TPUCompileMlir). TODO(lyandy): Deprecate proto once generic metadata proto is created.

Used in: TpuCompilationRequestProto

message TPUCompileMetadataProto.Arg

compile_metadata.proto:18

Description of the types and shapes of the arguments to a computation.

Used in: TPUCompileMetadataProto

enum TPUCompileMetadataProto.Arg.EnableXlaSharding

compile_metadata.proto:38

Used in: Arg

enum TPUCompileMetadataProto.Arg.Kind

compile_metadata.proto:19

Used in: Arg

message TPUCompileMetadataProto.Retval

compile_metadata.proto:69

Description of the return values from a computation.

Used in: TPUCompileMetadataProto

message TPUEmbeddingConfiguration

tpu_embedding_configuration.proto:8

enum TPUEmbeddingConfiguration.Mode

tpu_embedding_configuration.proto:27

Mode. Should the embedding layer program be run for inference (just forward pass), training (both forward and backward pass) or just the backward_pass.

Used in: TPUEmbeddingConfiguration

enum TPUEmbeddingConfiguration.ShardingStrategy

tpu_embedding_configuration.proto:57

Sharding strategy of the embedding tables among the hosts. If the sharding_strategy is "mod", each id is assigned to host "id % num_hosts". For instance, 13 ids are split across 5 hosts as: [[0, 5, 10], [1, 6, 11], [2, 7, 12], [3, 8], [4, 9]]. If the sharding_strategy is "div", ids are assigned to hosts in a contiguous manner. In this case, 13 ids are split across 5 hosts as: [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10], [11, 12]]. In both the strategies, if the id space does not evenly divide the number of hosts, each of the first "table_descriptor.vocabulary_size % num_hosts" hosts will be assigned one more id. This partitioning strategy exactly follows that in the embedding_lookup TensorFlow function at tensorflow/python/ops/embedding_ops.py.

Used in: TPUEmbeddingConfiguration

message TPUEmbeddingConfiguration.TableDescriptor

tpu_embedding_configuration.proto:10

Description of the various embedding tables.

Used in: TPUEmbeddingConfiguration

message TPUEmbeddingOutputLayout

tpu_embedding_output_layout.proto:19

Used in: TPUEmbeddingConfiguration

message TPUEmbeddingOutputLayout.EmbeddingOutputTensor

tpu_embedding_output_layout.proto:71

Format information for a single output tensor.

Used in: TPUEmbeddingOutputLayout

message TPUEmbeddingOutputLayout.FeatureDescriptor

tpu_embedding_output_layout.proto:41

Description of the output placement for one feature.

Used in: TableDescriptor

message TPUEmbeddingOutputLayout.OutputLocation

tpu_embedding_output_layout.proto:23

Location of one copy of the feature's data.

Used in: FeatureDescriptor

message TPUEmbeddingOutputLayout.TableDescriptor

tpu_embedding_output_layout.proto:49

Description of the output placement for features of one table.

Used in: TPUEmbeddingOutputLayout

message TPUEmbeddingOutputLayout.TwoDOutputTensor

tpu_embedding_output_layout.proto:61

Size and layout information for 2-D tensors.

Used in: EmbeddingOutputTensor

message TopologyProto

topology.proto:8

Describes the geometry of a TPU mesh.

message TpuCompilationRequestProto

tpu_compile.proto:27

TPU compilation request for compiling computations into XLA HLO IR and build TPU programs.

message TpuCompilationUidAndIndex

tpu_compilation_cache_common.proto:27

Used in: GetTpuProgramRequest

message UserDefinedProgramParameters

optimization_parameters.proto:340

A user-defined optimizer. The contained HLO program must take the following arguments in the following order: 1. gradients 2. table weights 3. slot variables 4. an optional scalar input that is passed in via the dynamic learning rate mechanism. It must return/end in a tuple op that contains the following values in the following order: 1. new table values 2. new slot variable value The program must have shape (1,1) with dtype float32 throughout and only use HLO that operate elementwise (e.g., no reduce, no variables, no control flow and no broadcasting outside of the single scalar input). The HLO program should be written as if it were a dense update. It will be called on each row that needs an update and will applied elementwise.

Used in: OptimizationParameters