Get desktop application:
View/edit binary Protocol Buffers messages
Coordination Service defines a TensorFlow service that controls and coordinates distributed execution in a cluster of multiple tasks. The service keeps track of the cluster configuration and the state of cluster members or the leader depending on the role of the current task. The distributed runtime leverages this service to coordinate and perform cluster initialization, check the healthiness of tasks, and propagate error messages to the cluster.
Blocks until all (or a subset of) tasks are at the barrier or the barrier fails. `barrier_id` should be unique across barriers. Once the barrier has passed or failed, subsequent calls will not block, and immediately respond with the previous response. The first WaitAtBarrier() call received by the service for a particular barrier id is special in that it determines the barrier deadline based on timeout duration. However, if subsequent calls by different agents specify a different set of `tasks` for the same `barrier_id`, the barrier will fail instantly. If no tasks are specified (default), the barrier will block for all the connected tasks. Possible service errors: - DeadlineExceeded: Timed out waiting for specified tasks at the barrier. Deadline is determined by the server timestamp when it receives the first WaitAtBarrier() + timeout duration. - Cancelled: One of the tasks called CancelBarrier(). - Aborted: Service is shutting down. - Internal: Any participating task is in ERROR state. - InvalidArgument: (1) Conflicting tasks specified by different agents for the same barrier, (2) one of the participating tasks is not in the cluster, or (3) task making the request is not included in the list of participating tasks.
Request and response messages for generic sync barriers.
Denotes list of tasks that will wait for the barrier. If unspecified, it implies that the entire cluster is participating in the barrier.
Task that is making the request.
(message has no fields)
Aborts the barrier if it is ongoing. Current and future WaitAtBarrier() calls with the same id will return a CANCELLED error status. Possible service errors: - FailedPrecondition: Barrier has already been passed.
Request and response messages for cancelling generic sync barriers.
Task that is making the request.
(message has no fields)
Delete configuration key-value. If is_directory is set in request, recursively clean up all key-values under the path specified by `key`.
Request and response messages for deleting configuration key-value data. When is_directory is true, delete key-values recursively under `key`.
(message has no fields)
Get configuration key-value. The request blocks until the key-value data becomes available (i.e., set by a task in the cluster).
Request and response messages for getting configuration key-value data.
Same as GetKeyValue, but returns all values that have keys which are prefixed with the directory key.
Get the state of a remote task. Specifically, RPC returns a CoordinatedTaskState, and if the task is in an error status, returns a non-OK error code, non-empty error message and error payload.
Request and response messages for getting state of a remote task.
Heartbeat message from task to coordination service. Heartbeat is sent from a task to refresh its timestamp on leader to avoid it becoming stale. RPC responds immediately after refreshing the timestamp on leader.
Request and response messages for sending heartbeats.
If there are failures in cluster, use additional metadata in response to broadcast error code and message to other tasks.
Insert configuration key-value that will be accessible to all cluster tasks. The key can be formatted as Unix file path with hierarchy. The coordination service key-value store should only be used for cluster configuration data.
Request and response messages for inserting configuration key-value data.
(message has no fields)
Register task to coordination service so that the service starts to track liveness of the task. RPC blocks and returns only when it registers to the service successfully, or error happens in the registering process.
Request and response messages for registering a task to the cluster leader. A task is uniquely represented by its `job_name`, `task_id` and `incarnation`. Leader responds with its `incarnation` to identify a leader process.
Report task error to coordination service. RPC sets the service-side task state to error, and propagate the error to other tasks in the cluster.
Request and response messages for reporting errors to service instance.
(message has no fields)
Report error to the task. RPC sets the receiving instance of coordination service agent to error state permanently. TODO(b/195990880): Consider splitting this into a different RPC service.
Request and response messages for reporting errors to task.
(message has no fields)
Disconnects task from the service if it is in an ERROR state, thereby allowing it to reconnect via RegisterTask() in the future.
Request and response messages for resetting a task state in the service.
(message has no fields)
Disconnects task from the service. If `shutdown_barrier_timeout_in_ms` is specified in the config, blocks until all tasks reach the barrier before disconnecting together. If the barrier times out, tasks at the barrier will still disconnect, while an error is reported to tasks that did not reach the barrier on time.
Request and response messages for disconnecting a task from the service.
(message has no fields)
Get configuration key-value. The request does not block, but returns an error if the requested key does not exist.
Wait for all tasks in the cluster to be up and running. The RPC request only gets responded when all tasks have registered, or some error occurs.
Request and response messages for waiting for all tasks.
All local device attributes on the request sender.
All devices in the cluster.
EventListener: Receives Event protos, e.g., from debugged TensorFlow runtime(s).
Client(s) can use this RPC method to send the EventListener Event protos. The Event protos can hold information such as: 1) intermediate tensors from a debugged graph being executed, which can be sent from DebugIdentity ops configured with grpc URLs. 2) GraphDefs of partition graphs, which can be sent from special debug ops that get executed immediately after the beginning of the graph execution.
Send a collection of source code files being debugged.
A collection of source code files.
Send the tracebacks of a TensorFlow execution call.
Data on the traceback of a debugged call, e.g., a Session.run() call, or the execution of an eager operation.
A key for the call. For example, for graph execution, this is a key consisting of the names of the fed and fetched tensors.
Traceback stack for the origin of the call event. For graph execution, this is the stack of the Session.run() call. For eager execution, this is the stack of the Python line that invokes the execution of the eager op.
Keeps track of the mapping from integer IDs in `origin_stack` to actual string values (e.g., file paths, function names).
Traceback for the graph (if any) involved in the call.
Version of the graph in `graph_traceback` (if any).
////////////////////////////////////////////////////////////////////////////// ProfileAnalysis service provide entry point for profiling TPU and for serving profiled data to Tensorboard through GRPC //////////////////////////////////////////////////////////////////////////////
Enumerate existing sessions and return available profile tools.
Auxiliary error_message.
If success, the returned sessions information are stored here.
Retrieve specific tool's data for specific session.
The place where we will read profile data. We will normally use MODEL_DIR/plugins/profile as the repository root.
Which host the data is associated. if empty, data from all hosts are aggregated.
Which tool
Tool's specific parameters. e.g. TraceViewer's viewport etc
Auxiliary error_message.
Output format. e.g. "json" or "proto" or "blob"
TODO(jiesun): figure out whether to put bytes or oneof tool specific proto.
Starts a profiling session, blocks until it completes. TPUProfileAnalysis service delegate this to TPUProfiler service. Populate the profiled data in repository, then return status to caller.
The place where we will dump profile data. We will normally use MODEL_DIR/plugins/profile as the repository root.
host or host:port, port will be ignored.
Auxiliary error_message.
Whether all hosts had returned a empty trace.
The ProfilerService service retrieves performance information about the programs running on connected devices over a period of time.
Collects profiling data and returns user-friendly metrics.
Next-ID: 4
Duration for which to profile between each update.
Indicates the level at which we want to monitor. Currently, two levels are supported: Level 1: An ultra lightweight mode that captures only some utilization metrics. Level 2: More verbose than level 1. Collects utilization metrics, device information, step time information, etc. Do not use this option if the TPU host is being very heavily used.
True to display timestamp in monitoring result.
Next-ID: 11
Properly formatted string data that can be directly returned back to user.
A collection of monitoring results for each field show in data.
Starts a profiling session, blocks until it completes, and returns data.
Next-ID: 8
Data payload for each required tools.
When we write profiling data directly to repository directory, we need a way to figure out whether the captured trace is empty.
Signal to terminate the Profile rpc for a on-going profiling session, The Profile rpc will return successfully and prematurely without timeout. This is used by programmatic mode to end the session in workers.
Which session id to terminate.
(message has no fields)
Used in: ,
Total number of bytes requested
Total number of bytes allocated if known
Name of the allocator used
Identifier of the allocated buffer if known
Set if this tensor only has one remaining reference
Address of the allocation.
An allocation/de-allocation operation performed by the allocator.
Used in: ,
The timestamp of the operation.
Number of bytes allocated, or de-allocated if negative.
Used in:
These are per-node allocator memory stats.
The bytes that are not deallocated.
The allocation and deallocation timeline.
These are snapshots of the overall allocator memory stats. The number of live bytes currently allocated by the allocator.
Used to specify and override the default API & behavior in the generated code for client languages, from what you would get from the OpDef alone. There will be a set of ApiDefs that are common to all client languages, and another set per client language. The per-client-language ApiDefs will inherit values from the common ApiDefs which it can either replace or modify. We separate the API definition from the OpDef so we can evolve the API while remaining backwards compatible when interpreting old graphs. Overrides go in an "api_def.pbtxt" file with a text-format ApiDefs message. WARNING: Be *very* careful changing the API for any existing op -- you can change the semantics of existing code. These changes may need to wait until a major release of TensorFlow to avoid breaking our compatibility promises.
Used in:
Name of the op (in the OpDef) to specify the API for.
If this op is deprecated, set deprecation message to the message that should be logged when this op is used. The message should indicate alternative op to use, if any.
Major version when the op will be deleted. For e.g. set this value to 2 if op API should be removed in TensorFlow 2.0 and deprecated in versions before that.
List of original in_arg names to specify new argument order. Length of arg_order should be either empty to keep current order or match size of in_arg.
One-line human-readable description of what the Op does.
Additional, longer human-readable description of what the Op does.
Modify an existing/inherited description by adding text to the beginning or end.
Used in:
Change the name used to access this arg in the API from what is used in the GraphDef. Note that these names in `backticks` will also be replaced in the summary & description fields.
Note: this will replace any inherited arg doc. There is no current way of modifying arg descriptions (other than replacing them entirely) as can be done with op descriptions.
Description of the graph-construction-time configuration of this Op. That is to say, this describes the attr fields that will be specified in the NodeDef.
Used in:
Change the name used to access this attr in the API from what is used in the GraphDef. Note that these names in `backticks` will also be replaced in the summary & description fields.
Specify a new default value to use for this attr. This default will be used when creating new graphs, as opposed to the default in the OpDef, which will be used when interpreting old GraphDefs.
Note: this will replace any inherited attr doc, there is no current way of modifying attr descriptions as can be done with op descriptions.
If you specify any endpoint, this will replace all of the inherited endpoints. The first endpoint should be the "canonical" endpoint, and should not be deprecated (unless all endpoints are deprecated).
Used in:
Name should be either like "CamelCaseName" or "Package.CamelCaseName". Client-language-specific ApiDefs may use a snake_case convention instead of CamelCase.
Set if this endpoint is deprecated. If set to true, a message suggesting to use a non-deprecated endpoint instead will be printed. If all endpoints are deprecated, set deprecation_message in ApiDef instead.
Major version when an endpoint will be deleted. For e.g. set this value to 2 if endpoint should be removed in TensorFlow 2.0 and deprecated in versions before that.
Used in:
Normally this is "VISIBLE" unless you are inheriting a different value from another ApiDef.
Publicly visible in the API.
Do not include this op in the generated API. If visibility is set to 'SKIP', other fields are ignored for this op.
Hide this op by putting it into an internal namespace (or whatever is appropriate in the target language).
An asset file def for a single file or a set of sharded files with the same name.
Used in:
The tensor to bind the asset filename to.
The filename within an assets directory. Note: does not include the path prefix, i.e. directories. For an asset at /tmp/path/vocab.txt, the filename would be "vocab.txt".
Protocol buffer representing the value for an attr used to configure an Op. Comment indicates the corresponding attr type. Only the field matching the attr type may be filled.
Used in: , , , , , , , , , ,
"string"
"int"
"float"
"bool"
"type"
"shape"
"tensor"
any "list(...)"
"func" represents a function. func.name is a function's name or a primitive op's name. func.attr.first is the name of an attr defined for that function. func.attr.second is the value for that attr in the instantiation.
This is a placeholder only used in nodes defined inside a function. It indicates the attr value will be supplied when the function is instantiated. For example, let us suppose a node "N" in function "FN". "N" has an attr "A" with value placeholder = "foo". When FN is instantiated with attr "foo" set to "bar", the instantiated node N's attr A will have been given the value "bar".
LINT.IfChange
Used in:
"list(string)"
"list(int)"
"list(float)"
"list(bool)"
"list(type)"
"list(shape)"
"list(tensor)"
"list(attr)"
Used in:
TODO(b/189530096): Support autotune maps for more ops.
Used in:
Legacy and unused in new data; superseded by AlgorithmProto.
Used in: ,
Legacy and unused in new data; superseded by AlgorithmProto.
Used in: ,
Used in:
Algorithm wrote memory outside its output buffers.
Algorithm gave a different result from a reference algorithm.
Algorithm was rejected for failing to run or for known bugs.
Used in:
For failure_kind == WRONG_RESULT, this field indicates the reference configuration that we compared against. Note that the reference algorithm isn't always correct. However, empirically it's more correct, as it's "algo 0", less fancy than the compared one.
Used in: ,
Records all auto-tuning results per algorithm.
stream_executor::DeviceDescription::pci_bus_id.
Matches DeviceAttributes
Used in:
Device name.
Device type, e.g. 'CPU' or 'GPU'.
Memory capacity in bytes.
The physical description of this device.
Used in:
Each unit test or benchmark in a test or benchmark run provides some set of information. Here we provide some reasonable keys one would expect to see, with optional key/value pairs for things we haven't considered. This BenchmarkEntry should be emitted by each unit test or benchmark reporter.
Used in:
The name of the specific benchmark or test (e.g. BM_AdjustContrast_gpu_B_W_H)
If a benchmark, how many iterations it was run for
Total cpu time used for all iterations (in seconds)
Total wall time used for all iterations (in seconds)
Throughput (in MB/s)
Generic map from result key to value.
Metric name, value and expected range. This can include accuracy metrics typically used to determine whether the accuracy test has passed
Used in:
A protobuf to represent tf.BoundedTensorSpec.
Used in:
Used in:
opt, dbg, etc
CC compiler flags, if known
Bazel compilation options, if known
Describes the metadata related to a checkpointed tensor.
The tensor dtype and shape.
The binary content of the tensor lies in: File "shard_id": bytes [offset, offset + size).
The CRC32C checksum of the tensor bytes.
Iff present, this entry represents a partitioned tensor. The previous fields are interpreted as follows: "dtype", "shape": describe the full tensor. "shard_id", "offset", "size", "crc32c": all IGNORED. These information for each slice can be looked up in their own BundleEntryProto, keyed by each "slice_name".
Special header that is associated with a bundle. TODO(zongheng,zhifengc): maybe in the future, we can add information about which binary produced this checkpoint, timestamp, etc. Sometime, these can be valuable debugging information. And if needed, these can be used as defensive information ensuring reader (binary version) of the checkpoint and the writer (binary version) must match within certain range, etc.
Number of data files in the bundle.
Versioning of the tensor bundle format.
An enum indicating the endianness of the platform that produced this bundle. A bundle can only be read by a platform with matching endianness. Defaults to LITTLE, as most modern platforms are little-endian. Affects the binary tensor data bytes only, not the metadata in protobufs.
Used in:
LINT.IfChange Containers to hold repeated fundamental values.
Used in:
Used in:
How fast are these cpus?
Additional cpu information. For example, Intel Ivybridge with HyperThreading (24 cores) dL1:32KB dL2:256KB dL3:30MB
What kind of cpu scaling is enabled on the host. Examples include "performance", "ondemand", "conservative", "mixed".
Cache sizes (in bytes), e.g. "L2": 262144 (for 256KB)
Used in:
Defines a subgraph in another `GraphDef` as a set of feed points and nodes to be fetched or executed. Compare with the arguments to `Session::Run()`.
Used in:
Tensors to be fed in the callable. Each feed is the name of a tensor.
Fetches. A list of tensor names. The caller of the callable expects a tensor to be returned for each fetch[i] (see RunStepResponse.tensor). The order of specified fetches does not change the execution order.
Target Nodes. A list of node names. The named nodes will be run by the callable but their outputs will not be returned.
Options that will be applied to each run.
Tensors to be connected in the callable. Each TensorConnection denotes a pair of tensors in the graph, between which an edge will be created in the callable.
The Tensor objects fed in the callable and fetched from the callable are expected to be backed by host (CPU) memory by default. The options below allow changing that - feeding tensors backed by device memory, or returning tensors that are backed by device memory. The maps below map the name of a feed/fetch tensor (which appears in 'feed' or 'fetch' fields above), to the fully qualified name of the device owning the memory backing the contents of the tensor. For example, creating a callable with the following options: CallableOptions { feed: "a:0" feed: "b:0" fetch: "x:0" fetch: "y:0" feed_devices: { "a:0": "/job:localhost/replica:0/task:0/device:GPU:0" } fetch_devices: { "y:0": "/job:localhost/replica:0/task:0/device:GPU:0" } } means that the Callable expects: - The first argument ("a:0") is a Tensor backed by GPU memory. - The second argument ("b:0") is a Tensor backed by host memory. and of its return values: - The first output ("x:0") will be backed by host memory. - The second output ("y:0") will be backed by GPU memory. FEEDS: It is the responsibility of the caller to ensure that the memory of the fed tensors will be correctly initialized and synchronized before it is accessed by operations executed during the call to Session::RunCallable(). This is typically ensured by using the TensorFlow memory allocators (Device::GetAllocator()) to create the Tensor to be fed. Alternatively, for CUDA-enabled GPU devices, this typically means that the operation that produced the contents of the tensor has completed, i.e., the CUDA stream has been synchronized (e.g., via cuCtxSynchronize() or cuStreamSynchronize()).
By default, RunCallable() will synchronize the GPU stream before returning fetched tensors on a GPU device, to ensure that the values in those tensors have been produced. This simplifies interacting with the tensors, but potentially incurs a performance hit. If this options is set to true, the caller is responsible for ensuring that the values in the fetched tensors have been produced before they are used. The caller can do this by invoking `Device::Sync()` on the underlying device(s), or by feeding the tensors back to the same Session using `feed_devices` with the same corresponding device name.
Used in:
Name of captured tensor
Name of concrete function which contains the computed graph tensor.
Input for the CheckpointReader fuzz test.
Protocol buffer representing the checkpoint state.
Path to the most-recent model checkpoint.
Paths to all not-yet-deleted model checkpoints, sorted from oldest to newest. Note that the value of model_checkpoint_path should be the last item in this list.
Unix timestamps corresponding to all_model_checkpoint_paths, indicating when each checkpoint was created.
Unix timestamp indicating the creation time for the last preserved checkpoint.
Used as request type in: grpc.WorkerService.CleanupAll
A list of container names. If 'container' is not empty, releases resources in the given containers in all devices. If 'container' is empty, releases resources in the default container in all devices.
Used as response type in: grpc.WorkerService.CleanupAll
(message has no fields)
Used as request type in: grpc.WorkerService.CleanupGraph
Used as response type in: grpc.WorkerService.CleanupGraph
(message has no fields)
Used as request type in: grpc.MasterService.CloseSession
Used as field type in:
REQUIRED: session_handle must be returned by a CreateSession call to the same master service.
Used as response type in: grpc.MasterService.CloseSession
Used as field type in:
(message has no fields)
Defines a TensorFlow cluster as a set of jobs.
Used in: ,
The jobs that comprise the cluster.
Defines the device filters for jobs in a cluster.
Used in:
Code location information: A stack trace with host-name information. Instead of encoding the detailed stack trace, this proto refers to IDs of stack frames stored as `StackFrameWithId` protos.
Used in: ,
Host name on which the source files are located.
ID to a stack frame, each of which is pointed to by a unique ID. The ordering of the frames is consistent with Python's `traceback.extract_tb()`.
CollectionDef should cover most collections. To add a user-defined collection, do one of the following: 1. For simple data types, such as string, int, float: tf.add_to_collection("your_collection_name", your_simple_value) strings will be stored as bytes_list. 2. For Protobuf types, there are three ways to add them: 1) tf.add_to_collection("your_collection_name", your_proto.SerializeToString()) collection_def { key: "user_defined_bytes_collection" value { bytes_list { value: "queue_name: \"test_queue\"\n" } } } or 2) tf.add_to_collection("your_collection_name", str(your_proto)) collection_def { key: "user_defined_string_collection" value { bytes_list { value: "\n\ntest_queue" } } } or 3) any_buf = any_pb2.Any() tf.add_to_collection("your_collection_name", any_buf.Pack(your_proto)) collection_def { key: "user_defined_any_collection" value { any_list { value { type_url: "type.googleapis.com/tensorflow.QueueRunnerDef" value: "\n\ntest_queue" } } } } 3. For Python objects, implement to_proto() and from_proto(), and register them in the following manner: ops.register_proto_function("your_collection_name", proto_type, to_proto=YourPythonObject.to_proto, from_proto=YourPythonObject.from_proto) These functions will be invoked to serialize and de-serialize the collection. For example, ops.register_proto_function(ops.GraphKeys.GLOBAL_VARIABLES, proto_type=variable_pb2.VariableDef, to_proto=Variable.to_proto, from_proto=Variable.from_proto)
Used in:
AnyList is used for collecting Any protos.
Used in:
BytesList is used for collecting strings and serialized protobufs. For example: collection_def { key: "trainable_variables" value { bytes_list { value: "\n\017conv1/weights:0\022\024conv1/weights/Assign \032\024conv1/weights/read:0" value: "\n\016conv1/biases:0\022\023conv1/biases/Assign\032 \023conv1/biases/read:0" } } }
Used in:
FloatList is used for collecting float values.
Used in:
Int64List is used for collecting int, int64 and long values.
Used in:
NodeList is used for collecting nodes in graph. For example collection_def { key: "summaries" value { node_list { value: "input_producer/ScalarSummary:0" value: "shuffle_batch/ScalarSummary:0" value: "ImageSummary:0" } }
Used in:
Used in:
Submitted changelist.
Hash of intermediate change between hash/changelist and what was tested. Not used if the build is from a commit without modifications.
Changelist tested if the change list is not already submitted.
Supplies one or more device names as members of the group identified by group_key. Service will respond when all group_size devices become known. All devices in group must have same type.
Used as request type in: grpc.WorkerService.CompleteGroup
Gives the complete membership of the group identified by group_key.
Used as response type in: grpc.WorkerService.CompleteGroup
number of distinct tasks hosting the devices
Supplies data about one collective op belonging to the instance identified by instance_key. Service will respond when all group_size ops have become known. Most of the data being sent is for correctness checking, to ensure that all ops in the instance share common attributes.
Used as request type in: grpc.WorkerService.CompleteInstance
Confirms that every op in the instance has consistently declared itself. Also gives the source_rank in case of broadcast.
Used as response type in: grpc.WorkerService.CompleteInstance
Metadata for CompositeTensorVariant, used when serializing as Variant. We define a new message here (rather than directly using TypeSpecProto for the metadata string) to retain flexibility to change the metadata encoding to support additional features.
Used in: ,
Protocol buffer representing a CondContext object.
Used in:
Name of the context.
Name of the pred tensor.
Name of the pivot tensor.
Branch prediction. 0 or 1.
Values and external values in control flow context.
Contexts contained inside this context (e.g. nested conds).
Session configuration parameters. The system picks appropriate values for fields that are not set.
Used in: , ,
Map from device type name (e.g., "CPU" or "GPU" ) to maximum number of devices of that type to use. If a particular device type is not found in the map, the system picks an appropriate number.
The execution of an individual op (for some op types) can be parallelized on a pool of intra_op_parallelism_threads. 0 means the system picks an appropriate number. If you create an ordinary session, e.g., from Python or C++, then there is exactly one intra op thread pool per process. The first session created determines the number of threads in this pool. All subsequent sessions reuse/share this one global pool. There are notable exceptions to the default behavior described above: 1. There is an environment variable for overriding this thread pool, named TF_OVERRIDE_GLOBAL_THREADPOOL. 2. When connecting to a server, such as a remote `tf.train.Server` instance, then this option will be ignored altogether.
Nodes that perform blocking operations are enqueued on a pool of inter_op_parallelism_threads available in each process. 0 means the system picks an appropriate number. Negative means all operations are performed in caller's thread. Note that the first Session created in the process sets the number of threads for all future sessions unless use_per_session_threads is true or session_inter_op_thread_pool is configured.
If true, use a new set of threads for this session rather than the global pool of threads. Only supported by direct sessions. If false, use the global threads created by the first session, or the per-session thread pools configured by session_inter_op_thread_pool. This option is deprecated. The same effect can be achieved by setting session_inter_op_thread_pool to have one element, whose num_threads equals inter_op_parallelism_threads.
This option is experimental - it may be replaced with a different mechanism in the future. Configures session thread pools. If this is configured, then RunOptions for a Run call can select the thread pool to use. The intended use is for when some session invocations need to run in a background pool limited to a small number of threads: - For example, a session may be configured to have one large pool (for regular compute) and one small pool (for periodic, low priority work); using the small pool is currently the mechanism for limiting the inter-op parallelism of the low priority work. Note that it does not limit the parallelism of work spawned by a single op kernel implementation. - Using this setting is normally not needed in training, but may help some serving use cases. - It is also generally recommended to set the global_name field of this proto, to avoid creating multiple large pools. It is typically better to run the non-low-priority work, even across sessions, in a single large pool.
Assignment of Nodes to Devices is recomputed every placement_period steps until the system warms up (at which point the recomputation typically slows down automatically).
When any filters are present sessions will ignore all devices which do not match the filters. Each filter can be partially specified, e.g. "/job:ps" "/job:worker/replica:3", etc.
Options that apply to all GPUs.
Whether soft placement is allowed. If allow_soft_placement is true, an op will be placed on CPU if 1. there's no GPU implementation for the OP or 2. no GPU devices are known or registered or 3. need to co-locate with reftype input(s) which are from CPU.
Whether device placements should be logged.
Options that apply to all graphs.
Global timeout for all blocking operations in this session. If non-zero, and not overridden on a per-operation basis, this value will be used as the deadline for all blocking operations.
Options that apply when this session uses the distributed runtime.
Optional list of all workers to use in this session.
If true, any resources such as Variables used in the session will not be shared with other sessions. However, when clusterspec propagation is enabled, this field is ignored and sessions are always isolated.
When true, WorkerSessions are created with device attributes from the full cluster. This is helpful when a worker wants to partition a graph (for example during a PartitionedCallOp).
Everything inside Experimental is subject to change and is not subject to API stability guarantees in https://www.tensorflow.org/guide/version_compat.
Used in:
Task name for group resolution.
Which executor to use, the default executor will be used if it is an empty string or "DEFAULT"
Guidance to formatting of large RecvBuf fields for transfer. Any positive value sets the max chunk size. 0 defaults to 4096. Any negative value indicates no max, i.e. one chunk only.
If true, and supported by the platform, the runtime will attempt to use NUMA affinity where applicable. One consequence will be the existence of as many CPU devices as there are available NUMA nodes.
If true, make collective op execution order sequential and deterministic for potentially concurrent collective instances.
If true, use NCCL for CollectiveOps. This feature is highly experimental.
In the following, session state means the value of a variable, elements in a hash table, or any other resource, accessible by worker sessions held by a TF server. When ClusterSpec propagation is enabled, the value of isolate_session_state is ignored when deciding whether to share session states in a TF server (for backwards compatibility reasons). - If share_session_state_in_clusterspec_propagation is true, the session states are shared. - If share_session_state_in_clusterspec_propagation is false, session states are isolated. When clusterspec propagation is not used, the value of share_session_state_in_clusterspec_propagation is ignored when deciding whether to share session states in a TF server. - If isolate_session_state is true, session states are isolated. - If isolate_session_state is false, session states are shared. TODO(b/129330037): Add a single API that consistently treats isolate_session_state and ClusterSpec propagation.
If using a direct session, disable spinning while waiting for work in the thread pool. This may result in higher latency for completing ops, but in the case where there is a lot of spinning may result in lower CPU usage.
This was promoted to a non-experimental API. Please use ConfigProto.share_cluster_devices_in_session instead.
Metadata about the session. If set, this can be used by the runtime and the Ops for debugging, monitoring, etc. NOTE: This is currently used and propagated only by the direct session.
If true, the session may treat the graph as being static for optimization purposes. If this option is set to true when a session is created, the full GraphDef must be passed in a single call to Session::Create(), and Session::Extend() may not be supported.
This field will eventually be deprecated and replaced by mlir_bridge_rollout (b/166038521). Whether to enable the MLIR-based TF->XLA bridge. This is a replacement to the existing bridge, and not ready for production usage yet. If this option is set to true when a session is created, MLIR is used to perform the set of graph transformations to put the graph in a form that can be executed with delegation of some computations to an accelerator. This builds on the model of XLA where a subset of the graph is encapsulated and attached to a "compile" operation, whose result is fed to an "execute" operation. The kernel for these operations is responsible to lower the encapsulated graph to a particular device.
This field is underdevelopment, for now use enable_mlir_bridge (b/166038521). Whether to enable the MLIR-based TF->XLA bridge.
Whether to enable the MLIR-based Graph optimizations. This will become a part of standard Tensorflow graph optimization pipeline, currently this is only used for gradual migration and testing new passes that are replacing existing optimizations in Grappler.
If true, the session will not store an additional copy of the graph for each subgraph. If this option is set to true when a session is created, the `RunOptions.output_partition_graphs` options must not be set.
Minimum number of batches run through the XLA graph before XLA fusion autotuner is enabled. Default value of zero disables the autotuner. The XLA fusion autotuner can improve performance by executing a heuristic search on the compiler parameters.
Whether runtime execution uses TFRT.
Whether functional control flow op lowering should be disabled. This is useful when executing within a portable runtime where control flow op kernels may not be loaded due to selective registration.
Provides a hint to XLA auto clustering to prefer forming a single large cluster that encompases most of the graph.
Distributed coordination service configurations.
An enum that describes the state of the MLIR bridge rollout.
Used in:
If this field is left unspecified, the MLIR bridge may be selectively enabled on a per graph basis.
Enabling the MLIR bridge enables it for all graphs in this session.
Disabling the MLIR bridge disables it for all graphs in this session.
Enable the MLIR bridge on a per graph basis based on an analysis of the features used in the graph. If the features used by the graph are supported by the MLIR bridge, the MLIR bridge will be used to run the graph.
Enable the MLIR bridge in a fallback mode on a per graph basis based on an analysis of the features used in the graph. Running the MLIR bridge in the fallback mode means that it is executed and it commits all the changes to the TF graph in case of success. And it does not in case of failures and let the old bridge to process the TF graph.
Container for any kind of control flow context. Any other control flow contexts that are added below should also be added here.
Used in: ,
Used in:
Used in:
This is the underlying data structure of class ConvParameters, which are used as the keys in cuDNN autotuning maps for retrieving corresponding cuDNN algorithms. This is used as a serialization format for saving/loading autotuning databases.
Used in:
data_format corresponds to type TensorFormat in third_party/tensorflow/core/util/tensor_format.h.
A string uniquely identifying a particular GPU model, e.g. V100 vs RTX 2080.
The version number of ConvParameters class. Offline autotune results whose version number is different from the runtime's version number (defined in ConvParameters::kVersion) will be rejected and ignored by LoadSerializedAutotuneMaps. This ensures that we will not load out-of-date autotune results.
This stores the information for fused convolution operations where an activation and a side input might follow the convolution.
Used in:
If true, this proto corresponds to a FusedConvBiasActivation operation implemented in the contrib library, otherwise this proto corresponds to the FusedConv operation implemented in the core library. Compared with FusedConv, FusedConvBiasActivation supports more types of activation function (including no activation) as well as the side_input. For now they have same type of keys in autotune maps, but the semantics of some fields (like padding) are different. So we add this field to distinguish them. TODO(b/177365158) Remove this field once these two operations are merged.
A convolution. Currently it's only used for logging. In the future, we may want to use it in the API as well.
result = conv_scale * conv(...) + side_value_scale * side_value. side_value is an arbitrary buffer if activation is not none. Otherwise, it has to be the result buffer (using its old values).
Represents a job type and the number of tasks under this job. For example, ("worker", 20) implies that there will be 20 worker tasks.
Used in:
Represents a remote worker task, specified by job name and task id.
Used in: , , , , , , , , , ,
Represents the state of a remote worker
Used in:
TASKSTATE_UNSPECIFIED is an invalid state such that indicates a bug.
TASKSTATE_UNINITIALIZED is an agent-only state. While the agent is disconnected, the service has no way of knowing if the task is initialized/uninitialized.
Used in:
Coordination service configuration parameters. The system picks appropriate values for fields that are not set.
Used in:
Type of coordination service implementation to enable. For example, setting the service type as "standalone" starts a service instance on the leader task to provide the coordination services such as heartbeats and consistent key-value store.
Address where the coordination service instance is hosted.
Whether to enable the health check mechanism.
Maximum wait time for all members in the cluster to be registered.
Heartbeat timeout, if a task does not record heartbeat in this time window, it will be considered disconnected. Note: This is also used as a grace period to accept any heartbeats after the agent has disconnected, to account for the lag time between the service recording the state change and the agent stopping heartbeats.
Denotes how long to wait for all coordination agents to reach the barriers (after the first shutdown request) before disconnecting together. If set to 0, no barrier is imposed upon shutdown and each worker can disconnect individually.
If set, agents do not make an explicit Shutdown() call. Service will only find out about the disconnecte agent via stale heartbeats. Used for testing.
The list of jobs which are recoverable. If a task in this list fails, it will not propagate error to other tasks. If empty, no jobs will be recoverable and every task failure will cause error propagation to other tasks.
Used in: ,
Status payload for all coordination service errors. Note: an empty proto may be set if the error is triggered by the task's own agent calls (i.e. not propagated by the service from another remote task).
Used in: ,
If true, error is reported via the agent API by the user (and not an internal service error).
Denotes which task hit the error. If unset, the error originated from the same task that is processing this error.
Used in: ,
Total cost of this graph, typically used for balancing decisions.
Used in:
Aggregated cost value.
Aggregated cost dimension (e.g. 'memory', 'compute', 'network').
Used in:
The name of the node. Names are globally unique.
The device of the node. Can be empty if the node is mapped to the default partition or partitioning hasn't been run yet.
The id of the node. Node ids are only unique inside a partition.
Temporary memory used by this node.
Persistent memory used by this node.
Estimate of the computational cost of this node, in microseconds.
Analytical estimate of the computational cost of this node, in microseconds.
Analytical estimate of the memory access cost of this node, in microseconds.
If true, the output is permanent: it can't be discarded, because this node is part of the "final output". Nodes may depend on final nodes.
Ids of the control inputs for this node.
Are the costs inaccurate?
Inputs of this node. They must be executed before this node can be executed. An input is a particular output of another node, specified by the node id and the output index.
Used in:
Outputs of this node.
Used in:
If >= 0, the output is an alias of an input. Note that an alias input may itself be an alias. The algorithm will therefore need to follow those pointers.
Used in:
Only valid if <is_set>.
Used in:
Used as request type in: grpc.MasterService.CreateSession
Used as field type in:
The initial graph definition.
Configuration options.
The target string used from the client's perspective.
Used as response type in: grpc.MasterService.CreateSession
Used as field type in:
The session handle to be used in subsequent calls for the created session. The client must arrange to call CloseSession with this returned session handle to close the session.
The initial version number for the graph, to be used in the next call to ExtendSession.
Used as request type in: grpc.WorkerService.CreateWorkerSession
Sessions are identified by a given handle.
Defines the configuration of a TensorFlow worker.
If true, any resources such as Variables used in the session will not be shared with other sessions.
The device attributes of all the devices in the cluster.
The master task name from which the request is sent.
The incarnation ID of the master task local CPU device. If the target worker already has a WorkerSession created previously with the same master task name but a different incarnation, it usually indicates that the previous master failed before deleting the WorkerSession on the worker. To prevent memory leaks, the worker should garbage collect the old WorkerSessions.
Used as response type in: grpc.WorkerService.CreateWorkerSession
(message has no fields)
Protocol buffer representing a CriticalSection.
Name of the critical section handle.
Protocol buffer representing a CriticalSection execution.
Name of the critical section handle.
Whether this operation requires exclusive access to its resources, (i.e., no other CriticalSections may request the same resources).
Used in: ,
Used in:
Unknown data class, used (implicitly) for legacy data. Will not be processed by data ingestion pipelines.
Scalar time series. Each `Value` for the corresponding tag must have `tensor` set to a rank-0 tensor of type `DT_FLOAT` (float32).
Tensor time series. Each `Value` for the corresponding tag must have `tensor` set. The tensor value is arbitrary, but should be small to accommodate direct storage in database backends: an upper bound of a few kilobytes is a reasonable rule of thumb.
Blob sequence time series. Each `Value` for the corresponding tag must have `tensor` set to a rank-1 tensor of bytestring dtype.
(== suppress_warning documentation-presence ==) LINT.IfChange
Used in: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Not a legal value for DataType. Used to indicate a DataType field has not been set.
Data types that all computation devices are expected to be capable to support.
Single-precision complex
Quantized int8
Quantized uint8
Quantized int32
Float32 truncated to 16 bits. Only for cast ops.
Quantized int16
Quantized uint16
Double-precision complex
Arbitrary C++ data types
Do not use! These are only for parameters. Every enum above should have a corresponding value below (verified by types_test).
An Event related to the debugging of a TensorFlow program.
Timestamp in seconds (with microsecond precision).
Step of training (if available).
Metadata related to this debugging data.
The content of a source file.
A stack frame (filename, line number and column number, function name and code string) with ID.
The creation of an op within a graph (e.g., a FuncGraph compiled from a Python function).
Information about a debugged graph.
Execution of an op or a Graph (e.g., a tf.function).
A graph execution trace: Contains information about the intermediate tensors computed during the graph execution.
The ID of the graph (i.e., FuncGraph) executed here: applicable only to the execution of a FuncGraph.
A device on which debugger-instrumented ops and/or tensors reside.
Metadata about the debugger and the debugged TensorFlow program.
Used in:
Version of TensorFlow.
Version of the DebugEvent file format. Has a format of "debug.Event:<number>", e.g., "debug.Event:1".
A unique ID for the current run of tfdbg. A run of tfdbg is defined as a TensorFlow job instrumented by tfdbg. Multiple hosts in a distributed TensorFlow job instrumented by tfdbg have the same ID.
Options for initializing DebuggerState in TensorFlow Debugger (tfdbg).
Used in: ,
Debugging options
Caller-specified global step count. Note that this is distinct from the session run count and the executor step count.
Whether the total disk usage of tfdbg is to be reset to zero in this Session.run call. This is used by wrappers and hooks such as the local CLI ones to indicate that the dumped tensors are cleaned up from the disk after each Session.run.
Option for watching a node in TensorFlow Debugger (tfdbg).
Used in:
Name of the node to watch. Use "*" for wildcard. But note: currently, regex is not supported in general.
Output slot to watch. The semantics of output_slot == -1 is that all outputs of the node will be watched (i.e., a wildcard). Other negative values of output_slot are invalid and will lead to errors currently.
Name(s) of the debugging op(s). One or more than one probes on a tensor. e.g., {"DebugIdentity", "DebugNanCount"}
URL(s) for debug targets(s). Supported URL formats are: - file:///foo/tfdbg_dump: Writes out Event content to file /foo/tfdbg_dump. Assumes all directories can be created if they don't already exist. - grpc://localhost:11011: Sends an RPC request to an EventListener service running at localhost:11011 with the event. - memcbk:///event_key: Routes tensors to clients using the callback registered with the DebugCallbackRegistry for event_key. Each debug op listed in debug_ops will publish its output tensor (debug signal) to all URLs in debug_urls. N.B. Session::Run() supports concurrent invocations of the same inputs (feed keys), outputs and target nodes. If such concurrent invocations are to be debugged, the callers of Session::Run() must use distinct debug_urls to make sure that the streamed or dumped events do not overlap among the invocations. TODO(cais): More visible documentation of this in g3docs.
Do not error out if debug op creation fails (e.g., due to dtype incompatibility). Instead, just log the failure.
A device on which ops and/or tensors are instrumented by the debugger.
Used in:
Name of the device.
A debugger-generated ID for the device. Guaranteed to be unique within the scope of the debugged TensorFlow program, including single-host and multi-host settings. TODO(cais): Test the uniqueness guarantee in multi-host settings.
A debugger-instrumented graph.
Used in:
An ID for the graph. This can be used up to look up graph names. Generated by the debugger.
Name of the graph (if available).
Names of the instrumented ops. This can be used to look up op name based on the numeric-summary tensors (2nd column).
Original (uninstrumented) GraphDef (if available).
An encoded version of a GraphDef. This graph may include the debugger-inserted ops.
IDs of the immediate enclosing context (graph), if any.
Used in:
The host name on which a source code file is located.
Path to the source code file.
The timestamp at which the source code file is last modified.
Byte size of the file.
Line-by-line content of the source code file.
Used as request type in: grpc.WorkerService.DeleteWorkerSession
Sessions are identified by a given handle.
Used as response type in: grpc.WorkerService.DeleteWorkerSession
(message has no fields)
Used as request type in: grpc.WorkerService.DeregisterGraph
The session_handle used when registering the graph. If session_handle is empty, a single global namespace is used.
Set to true if `CreateWorkerSession` was called for `session_handle`.
REQUIRED: graph_handle must be returned by a RegisterGraph call to the same WorkerService.
TODO(mrry): Optionally add summary stats for the graph.
Used as response type in: grpc.WorkerService.DeregisterGraph
(message has no fields)
Used in: , , , , , , , , ,
Fully specified name of the device within a cluster.
String representation of device_type.
Memory capacity of device in bytes.
Platform-specific data about device that may be useful for supporting efficient data transfers.
A device is assigned a global unique number each time it is initialized. "incarnation" should never be 0.
String representation of the physical device that this device maps to.
A physical device ID for use in XLA DeviceAssignments, unique across clients in a multi-client setup. Set to -1 if unavailable, non-negative otherwise.
Used in: , ,
Optional bus locality of device. Default value of 0 means no specific locality. Specific localities are indexed from 1.
Optional NUMA locality of device.
Optional local interconnect links to other devices.
Used in: ,
Device type (CPU, GPU, ...)
Vendor (Intel, nvidia, ...)
Model (Haswell, K40, ...)
Core Frequency in Mhz
Number of cores
Version of the tools and libraries used with this device (e.g. gcc 4.9, cudnn 5.1)
Number of registers per core.
L1 cache size in bytes
L2 cache size in bytes
L3 cache size in bytes
Shared memory size per multiprocessor in bytes. This field is applicable to GPUs only.
Memory size in bytes
Memory bandwidth in KB/s
Used in:
Its key is thread id.
Represents a Python dict keyed by `str`. The comment on Unicode from Value.string_value applies analogously.
Used in:
Used in:
Protocol buffer representing an event that happened during the execution of a Brain model.
Used as request type in: EventListener.SendEvents
Used as field type in:
Timestamp of the event.
Global step of the event.
An event file was started, with the specified version. This is use to identify the contents of the record IO files easily. Current version is "brain.Event:2". All versions start with "brain.Event:".
An encoded version of a GraphDef.
A summary was generated.
The user output a log message. This was theoretically used by the defunct tensorboard_logging module, which has since been removed; this field is now deprecated and should not be used.
The state of the session which can be used for restarting after crashes.
The metadata returned by running a session.run() call.
An encoded version of a MetaGraphDef.
Reply message from EventListener to the client, i.e., to the source of the Event protocol buffers, e.g., debug ops inserted by a debugged runtime to a TensorFlow graph being executed.
Used as response type in: EventListener.SendEvents, EventListener.SendSourceFiles, EventListener.SendTracebacks
New tensor value to override the current tensor value with.
TODO(cais): Make use of this field to implement overriding of tensor value during debugging.
Used in:
Used in:
This message is parallel to Example, but with additional fields to test unknown fields handling in example_proto_fast_parsing_test.cc.
Data relating to the eager execution of an op or a Graph. For a op that generates N output tensors (N >= 0), only one Execution proto will be used to describe the execution event.
Used in:
Op type (e.g., "MatMul"). In the case of a Graph, this is the name of the Graph.
Number of output tensors.
The graph that's executed: applicable only to the eager execution of a FuncGraph.
IDs of the input tensors (if available).
IDs of the output tensors (if availbable). If specified, must have the same length as tensor_protos.
Type of the tensor value encapsulated in this proto.
Output Tensor values in the type described by `tensor_value_type`. The length of this should match `num_outputs`.
Stack trace of the eager execution.
Debugged-generated IDs of the devices on which the output tensors reside. To look up details about the device (e.g., name), cross-reference this field with the DebuggedDevice messages.
Options specific to the execution of a single step.
Used in:
Used as request type in: grpc.MasterService.ExtendSession
Used as field type in:
REQUIRED: session_handle must be returned by a CreateSession call to the same master service.
REQUIRED: The nodes to be added to the session's graph. If any node has the same name as an existing node, the operation will fail with ILLEGAL_ARGUMENT.
REQUIRED: The version number of the graph to be extended. This will be tested against the current server-side version number, and the operation will fail with FAILED_PRECONDITION if they do not match.
TODO(mrry): Return something about the operation?
Used as response type in: grpc.MasterService.ExtendSession
Used as field type in:
The new version number for the extended graph, to be used in the next call to ExtendSession.
Containers for non-sequential data.
Used in: ,
Each feature can be exactly one kind.
Used in:
Containers for sequential data. A FeatureList contains lists of Features. These may hold zero or more Feature values. FeatureLists are organized into categories by name. The FeatureLists message contains the mapping from name to FeatureList.
Used in:
Used in:
Map from feature name to feature list.
Used in: , ,
Map from feature name to feature.
Protocol buffer representing a SavedModel Fingerprint. If there are multiple MetaGraphDefs in the SavedModel, the FingerprintDef corresponds to the first one.
Hash of the graph_def, referred to as a "checksum".
Hash of regularized graph_def.
Hash of the regularized (sorted) SignatureDefs.
Hash of the regularized SavedObjectGraph.
Hash of the checkpoint.
Version specification of the fingerprint.
Used in:
Used in:
Highly experimental and very likely to change. This encoding uses tags instead of dedicated messages for regularity. In particular the encoding imposes no restrictions on what the parameters of any type should be, which in particular needs to be true for type symbols.
Used in: , ,
The principal type represented by this object. This may be a concrete type (Tensor, Dataset) a type variable (used for dependent types) a type symbol (Any, Union). See FullTypeId for details.
Literal values of this type object, if the type admits one. For example, a type variable admits a string attribute - its name. Shape-related types may admit int attributes - their static shape values. Fields for more data types to be added as needed.
TODO(mdan): list/tensor, map? Need to reconcile with TFT_RECORD, etc.
LINT.IfChange Experimental. Represents the complete type information of a TensorFlow value.
Used in:
The default represents an uninitialized values.
Type variables may serve as placeholder for any other type ID in type templates. Examples: TFT_DATASET[TFT_VAR["T"]] is a Dataset returning a type indicated by "T". TFT_TENSOR[TFT_VAR["T"]] is a Tensor of n element type indicated by "T". TFT_TENSOR[TFT_VAR["T"]], TFT_TENSOR[TFT_VAR["T"]] are two tensors of identical element types. TFT_TENSOR[TFT_VAR["P"]], TFT_TENSOR[TFT_VAR["Q"]] are two tensors of independent element types.
Wildcard type. Describes a parameter of unknown type. In TensorFlow, that can mean either a "Top" type (accepts any type), or a dynamically typed object whose type is unknown in context. Important: "unknown" does not necessarily mean undeterminable!
The algebraic product type. This is an algebraic type that may be used just for logical grouping. Not to confused with TFT_TUPLE which describes a concrete object of several elements. Example: TFT_DATASET[TFT_PRODUCT[TFT_TENSOR[TFT_INT32], TFT_TENSOR[TFT_FLOAT64]]] is a Dataset producing two tensors, an integer one and a float one.
Represents a named field, with the name stored in the attribute. Parametrization: TFT_NAMED[<type>]{<name>} * <type> is the type of the field * <name> is the field name, as string (thpugh can theoretically be an int as well) Example: TFT_RECORD[ TFT_NAMED[TFT_TENSOR[TFT_INT32]]{'foo'}, TFT_NAMED[TFT_TENSOR[TFT_FLOAT32]]{'bar'}, ] is a structure with two fields, an int tensor "foo" and a float tensor "bar".
Template definition. Expands the variables by repeating a template as arguments of container. Parametrization: TFT_FOR_EACH[<container_type>, <template>, <expansions>] * <container_type> is the type of the container that the template will be expanded into * <template> is any type definition that potentially contains type variables * <expansions> is a TFT_VAR and may include more types in the future Example: TFT_FOR_EACH[ TFT_PRODUCT, TFT_TENSOR[TFT_VAR["t"]], TFT_VAR["t"] ] will substitute a T = TFT_INT32 to TFT_PRODUCT[TFT_TENSOR[TFT_INT32]] and a T = (TFT_INT32, TFT_INT64) to TFT_PRODUCT[TFT_TENSOR[TFT_INT32], TFT_TENSOR[TFT_INT64]].
Callable types describe functions and ops. Parametrization: TFT_CALLABLE[<arg type>, <return type>] * <arg type> is the type of the arguments; TFT_PRODUCT represents multiple arguments. * <return type> is the return type; TFT_PRODUCT represents multiple return values (that means that callables returning multiple things don't necessarily return a single tuple). Example: TFT_CALLABLE[ TFT_ANY, TFT_PRODUCT[TFT_TENSOR[TFT_INT32], TFT_TENSOR[TFT_FLOAT64]], ] is a callable with unspecified (for now) input arguments, and two return values of type tensor.
The usual Tensor. This is a parametric type. Parametrization: TFT_TENSOR[<element type>, <shape type>] * <element type> is currently limited to one of the element types defined below. * <shape type> is not yet defined, and may only be TFT_UNKNOWN for now. A TFT_SHAPE type will be defined in the future. Example: TFT_TENSOR[TFT_INT32, TFT_UNKNOWN] is a Tensor of int32 element type and unknown shape. TODO(mdan): Define TFT_SHAPE and add more examples.
Array (or tensorflow::TensorList in the variant type registry). Note: this is not to be confused with the deprecated `TensorArray*` ops which are not supported by FullType. This type represents a random-access list whose elements can be described by a single type. Although immutable, Array is expected to support efficient mutation semantics (i.e. element update) in the user-facing API. The element type may be generic or even TFT_ANY for a heterogenous list. Parametrization: TFT_ARRAY[<element type>] * <element type> may be any concrete type. Examples: TFT_ARRAY[TFT_TENSOR[TFT_INT32]] is a TensorArray holding int32 Tensors of any shape. TFT_ARRAY[TFT_TENSOR[TFT_UNKNOWN]] is a TensorArray holding Tensors of mixed element types. TFT_ARRAY[TFT_UNKNOWN] is a TensorArray holding any element type. TFT_ARRAY[] is equivalent to TFT_ARRAY[TFT_UNKNOWN]. TFT_ARRAY[TFT_ARRAY[]] is an array or arrays (of unknown types).
Optional (or tensorflow::OptionalVariant in the variant type registry). This type represents a value that may either hold an element of a single specified type, or nothing at all. Parametrization: TFT_OPTIONAL[<element type>] * <element type> may be any concrete type. Examples: TFT_OPTIONAL[TFT_TENSOR[TFT_INT32]] is an Optional holding an int32 Tensor of any shape.
Literal types describe compile-time constant values. Literal types may also participate in dependent types. Parametrization: TFT_LITERAL[<value type>]{<value>} * <value type> may be any concrete type compatible that can hold <value> * <value> is the type's attribute, and holds the actual literal value Examples: TFT_LITERAL[TFT_INT32]{1} is the compile-time constant 1.
Encoding types describe a value of a certain type, encoded as a different type. Parametrization: TFT_ENCODED[<encoded type>, <encoding type>] * <encoded type> may be any type * <encoding type> may be any type Examples: TFT_ENCODING[TFT_INT32, TFT_STRING] is an integer encoded as string.
The bool element type. TODO(mdan): Quantized types, legacy representations (e.g. ref)
Integer element types.
Floating-point element types.
Complex element types. TODO(mdan): Represent as TFT_COMPLEX[TFT_DOUBLE] instead?
The string element type.
Datasets created by tf.data ops and APIs. Datasets have generator/iterable semantics, that is, one can construct an iterator from them. Like Array, they are considered to return elements that can be described by a single type. Unlike Array, they do not support random access or mutation, and can potentially produce an infinite number of elements. A datasets can produce logical structures (e.g. multiple elements). This is expressed using TFT_PRODUCT. Parametrization: TFT_DATASET[<element type>]. * <element type> may be a concrete type or a type symbol. It represents the data type of the elements produced by the dataset. Examples: TFT_DATSET[TFT_TENSOR[TFT_INT32]] is a Dataset producing single int32 Tensors of unknown shape. TFT_DATSET[TFT_PRODUCT[TFT_TENSOR[TFT_INT32], TFT_TENSOR[TFT_FLOAT32]] is a Dataset producing pairs of Tensors, one integer and one float. Note: The high ID number is to prepare for the eventuality that Datasets will be supported by user types in the future.
A ragged tensor created by tf.ragged ops and APIs. Parametrization: TFT_RAGGED[<element_type>].
Iterators created by tf.data ops and APIs. Very similar to Datasets, except they are mutable. Parametrization: TFT_ITERATOR[<element type>]. * <element type> may be a concrete type or a type symbol. It represents the data type of the elements produced by the dataset.
A mutex lock tensor, produced by tf.raw_ops.MutexLock. Unlike strict execution models, where ownership of a lock is denoted by "running after the lock has been acquired", in non-strict mode, lock ownership is in the true sense: "the op argument representing the lock is available". Mutex locks are the dynamic counterpart of control dependencies. TODO(mdan): Properly document this thing. Parametrization: TFT_MUTEX_LOCK[].
The equivalent of a Tensor with DT_VARIANT dtype, kept here to simplify translation. This type should not normally appear after type inference. Note that LEGACY_VARIANT != ANY: TENSOR[INT32] is a subtype of ANY, but is not a subtype of LEGACY_VARIANT.
A function can be instantiated when the runtime can bind every attr with a value. When a GraphDef has a call to a function, it must have binding for every attr defined in the signature. TODO(zhifengc): * device spec, etc.
Used in: ,
The definition of the function's name, arguments, return values, attrs etc.
Attributes specific to this function definition.
Unique IDs for each resource argument, used to track aliasing resources. If Argument A and Argument B alias each other, then resource_arg_unique_ids[A.index] == resource_arg_unique_ids[B.index]. If this field is empty, none of the arguments could alias; otherwise, every resource argument should have an entry in this field. When instantiated, the unique IDs will be attached to the _Arg nodes' "_resource_arg_unique_id" attribute.
By convention, "op" in node_def is resolved by consulting with a user-defined library first. If not resolved, "func" is assumed to be a builtin op.
A mapping from the output arg names from `signature` to the outputs from `node_def` that should be returned by the function.
A mapping from control output names from `signature` to node names in `node_def` which should be control outputs of this function.
Attributes for function arguments. These attributes are the same set of valid attributes as to _Arg nodes.
Used in:
A library is a set of named functions.
Used in: , ,
Represents `FunctionSpec` used in `Function`. This represents a function that has been wrapped as a TensorFlow `Function`.
Used in: ,
Full arg spec from inspect.getfullargspec().
Whether this represents a class method.
The input signature, if specified.
Whether the function should be compiled by XLA. The public interface to `tf.function` uses an optional boolean to represent three distinct states for this field. Unfortunately, proto3 removes the ability to explicitly check for the presence or absence of a field, so we instead map to an enum. See `tf.function` for details.
Used in:
e.g. "Tesla K40c"
Final entry in output of "nvidia-smi -L"
e.g. "0000:04:00.0"
Used in:
Fraction of the available GPU memory to allocate for each process. 1 means to allocate all of the GPU memory, 0.5 means the process allocates up to ~50% of the available GPU memory. GPU memory is pre-allocated unless the allow_growth option is enabled. If greater than 1.0, uses CUDA unified memory to potentially oversubscribe the amount of memory available on the GPU device by using host memory as a swap space. Accessing memory not available on the device will be significantly slower as that would require memory transfer between the host and the device. Options to reduce the memory requirement should be considered before enabling this option as this may come with a negative performance impact. Oversubscription using the unified memory requires Pascal class or newer GPUs and it is currently only supported on the Linux operating system. See https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements for the detailed requirements.
If true, the allocator does not pre-allocate the entire specified GPU memory region, instead starting small and growing as needed.
The type of GPU allocation strategy to use. Allowed values: "": The empty string (default) uses a system-chosen default which may change over time. "BFC": A "Best-fit with coalescing" algorithm, simplified from a version of dlmalloc.
Delay deletion of up to this many bytes to reduce the number of interactions with gpu driver code. If 0, the system chooses a reasonable default (several MBs).
A comma-separated list of GPU ids that determines the 'visible' to 'virtual' mapping of GPU devices. For example, if TensorFlow can see 8 GPU devices in the process, and one wanted to map visible GPU devices 5 and 3 as "/device:GPU:0", and "/device:GPU:1", then one would specify this field as "5,3". This field is similar in spirit to the CUDA_VISIBLE_DEVICES environment variable, except it applies to the visible GPU devices in the process. NOTE: 1. The GPU driver provides the process with the visible GPUs in an order which is not guaranteed to have any correlation to the *physical* GPU id in the machine. This field is used for remapping "visible" to "virtual", which means this operates only after the process starts. Users are required to use vendor specific mechanisms (e.g., CUDA_VISIBLE_DEVICES) to control the physical to visible device mapping prior to invoking TensorFlow. 2. In the code, the ids in this list are also called "platform GPU id"s, and the 'virtual' ids of GPU devices (i.e. the ids in the device name "/device:GPU:<id>") are also called "TF GPU id"s. Please refer to third_party/tensorflow/core/common_runtime/gpu/gpu_id.h for more information.
In the event polling loop sleep this many microseconds between PollEvents calls, when the queue is not empty. If value is not set or set to 0, gets set to a non-zero default.
This field is deprecated and ignored.
Force all tensors to be gpu_compatible. On a GPU-enabled TensorFlow, enabling this option forces all CPU tensors to be allocated with Cuda pinned memory. Normally, TensorFlow will infer which tensors should be allocated as the pinned memory. But in case where the inference is incomplete, this option can significantly speed up the cross-device memory copy performance as long as it fits the memory. Note that this option is not something that should be enabled by default for unknown or very large models, since all Cuda pinned memory is unpageable, having too much pinned memory might negatively impact the overall host system performance.
Everything inside experimental is subject to change and is not subject to API stability guarantees in https://www.tensorflow.org/guide/version_compat.
Used in:
The multi virtual device settings. If empty (not set), it will create single virtual device on each visible GPU, according to the settings in "visible_device_list" above. Otherwise, the number of elements in the list must be the same as the number of visible GPUs (after "visible_device_list" filtering if it is set), and the string represented device names (e.g. /device:GPU:<id>) will refer to the virtual devices and have the <id> field assigned sequentially starting from 0, according to the order of the virtual devices determined by device_ordinal and the location in the virtual device list. For example, visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB } virtual_devices { memory_limit: 3GB memory_limit: 4GB } will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory /device:GPU:1 -> visible GPU 1 with 2GB memory /device:GPU:2 -> visible GPU 0 with 3GB memory /device:GPU:3 -> visible GPU 0 with 4GB memory but visible_device_list = "1,0" virtual_devices { memory_limit: 1GB memory_limit: 2GB device_ordinal: 10 device_ordinal: 20} virtual_devices { memory_limit: 3GB memory_limit: 4GB device_ordinal: 10 device_ordinal: 20} will create 4 virtual devices as: /device:GPU:0 -> visible GPU 1 with 1GB memory (ordinal 10) /device:GPU:1 -> visible GPU 0 with 3GB memory (ordinal 10) /device:GPU:2 -> visible GPU 1 with 2GB memory (ordinal 20) /device:GPU:3 -> visible GPU 0 with 4GB memory (ordinal 20) NOTE: 1. It's invalid to set both this and "per_process_gpu_memory_fraction" at the same time. 2. Currently this setting is per-process, not per-session. Using different settings in different sessions within same process will result in undefined behavior.
If true, uses CUDA unified memory for memory allocations. If per_process_gpu_memory_fraction option is greater than 1.0, then unified memory is used regardless of the value for this field. See comments for per_process_gpu_memory_fraction field for more details and requirements of the unified memory. This option is useful to oversubscribe memory if multiple processes are sharing a single GPU while individually using less than 1.0 per process memory fraction.
If > 1, the number of device-to-device copy streams to create for each GPUDevice. Default value is 0, which is automatically converted to 1.
If non-empty, defines a good GPU ring order on a single worker based on device interconnect. This assumes that all workers have the same GPU topology. Specify as a comma-separated string, e.g. "3,2,1,0,7,6,5,4". This ring order is used by the RingReducer implementation of CollectiveReduce, and serves as an override to automatic ring order generation in OrderTaskDeviceMap() during CollectiveParam resolution.
If true then extra work is done by GPUDevice and GPUBFCAllocator to keep track of when GPU memory is freed and when kernels actually complete so that we can know when a nominally free memory chunk is really not subject to pending use.
Parameters for GPUKernelTracker. By default no kernel tracking is done. Note that timestamped_allocator is only effective if some tracking is specified. If kernel_tracker_max_interval = n > 0, then a tracking event is inserted after every n kernels without an event.
If kernel_tracker_max_bytes = n > 0, then a tracking event is inserted after every series of kernels allocating a sum of memory >= n. If one kernel allocates b * n bytes, then one event will be inserted after it, but it will count as b against the pending limit.
If kernel_tracker_max_pending > 0 then no more than this many tracking events can be outstanding at a time. An attempt to launch an additional kernel will stall until an event completes.
BFC Allocator can return an allocated chunk of memory upto 2x the requested size. For virtual devices with tight memory constraints, and proportionately large allocation requests, this can lead to a significant reduction in available memory. The threshold below controls when a chunk should be split if the chunk size exceeds requested memory size. It is expressed as a fraction of total available memory for the tf device. For example setting it to 0.05 would imply a chunk needs to be split if its size exceeds the requested memory by 5% of the total virtual device/gpu memory size.
When true, use CUDA cudaMallocAsync API instead of TF gpu allocator.
By default, BFCAllocator may sleep when it runs out of memory, in the hopes that another thread will free up memory in the meantime. Setting this to true disables the sleep; instead we'll OOM immediately.
Configuration for breaking down a visible GPU into multiple "virtual" devices.
Used in:
Per "virtual" device memory limit, in MB. The number of elements in the list is the number of virtual devices to create on the corresponding visible GPU (see "virtual_devices" below). If empty, it will create single virtual device taking all available memory from the device. For the concept of "visible" and "virtual" GPU, see the comments for "visible_device_list" above for more information.
Priority values to use with the virtual devices. Use the cuda function cudaDeviceGetStreamPriorityRange to query for valid range of values for priority. On a P4000 GPU with cuda 10.1, the priority range reported was 0 for least priority and -1 for greatest priority. If this field is not specified, then the virtual devices will be created with the default. If this field has values set, then the size of this must match with the above memory_limit_mb.
Virtual Device ordinal number determines the device ID of the device. A Virtual device with a lower ordinal number always receives the a smaller device id. The phyiscal device id and location in the virtual device list is used to break ties.
Used as request type in: grpc.WorkerService.GetStatus
(message has no fields)
Used as response type in: grpc.WorkerService.GetStatus
Request for next agreed-upon step_id for the specified graph_keys. This is used to enable multiple graphs containing nodes from a common collective instance to coordinate using the same step_ids.
Used as request type in: grpc.WorkerService.GetStepSequence
Next valid step_ids for one or more graph_keys.
Used as response type in: grpc.WorkerService.GetStepSequence
GradientDef defines the gradient function of a function defined in a function library. A gradient function g (specified by gradient_func) for a function f (specified by function_name) must follow the following: The function 'f' must be a numerical function which takes N inputs and produces M outputs. Its gradient function 'g', which is a function taking N + M inputs and produces N outputs. I.e. if we have (y1, y2, ..., y_M) = f(x1, x2, ..., x_N), then, g is (dL/dx1, dL/dx2, ..., dL/dx_N) = g(x1, x2, ..., x_N, dL/dy1, dL/dy2, ..., dL/dy_M), where L is a scalar-value function of (x1, x2, ..., xN) (e.g., the loss function). dL/dx_i is the partial derivative of L with respect to x_i.
Used in:
The function name.
The gradient function's name.
This stores all the source code file names and can be indexed by the `file_index`.
This maps a node name to a stack trace in the source code. The map key is a mangling of the containing function and op name with syntax: op.name '@' func_name For ops in the top-level graph, the func_name is the empty string. Note that op names are restricted to a small number of characters which exclude '@', making it impossible to collide keys of this form. Function names accept a much wider set of characters. It would be preferable to avoid mangling and use a tuple key of (op.name, func_name), but this is not supported with protocol buffers.
This represents a file/line location in the source code.
Used in: ,
File name index, which can be used to retrieve the file name string from `files`. The value should be between 0 and (len(files)-1)
Line number in the file.
Col number in the file line.
Name of function contains the file line.
Source code contained in this file line.
This represents a stack trace which is a ordered list of `FileLineCol`.
Used in:
Each line in the stack trace.
Represents the graph of operations
Used in: , , , , , , , ,
Compatibility versions of the graph. See core/public/version.h for version history. The GraphDef version is distinct from the TensorFlow version, and each release of TensorFlow will support a range of GraphDef versions.
Deprecated single version field; use versions above instead. Since all GraphDef changes before "versions" was introduced were forward compatible, this field is entirely ignored.
"library" provides user-defined functions. Naming: * library.function.name are in a flat namespace. NOTE: We may need to change it to be hierarchical to support different orgs. E.g., { "/google/nn", { ... }}, { "/google/vision", { ... }} { "/org_foo/module_bar", { ... }} map<string, FunctionDefLib> named_lib; * If node[i].op is the name of one function in "library", node[i] is deemed as a function call. Otherwise, node[i].op must be a primitive operation supported by the runtime. Function call semantics: * The callee may start execution as soon as some of its inputs are ready. The caller may want to use Tuple() mechanism to ensure all inputs are ready in the same time. * The consumer of return values may start executing as soon as the return values the consumer depends on are ready. The consumer may want to use Tuple() mechanism to ensure the consumer does not start until all return values of the callee function are ready.
Data relating to an execution of a Graph (e.g., an eager execution of a FuncGraph). The values of the intermediate tensors computed in the graph are recorded in this proto. A graph execution may correspond to one or more pieces of `GraphExecutionTrace`, depending on whether the instrumented tensor values are summarized in an aggregated or separate fashion.
Used in:
Unique ID of the context that the executed op(s) belong to (e.g., a compiled concrete tf.function).
Name of the op (applicable only in the case of the `FULL_TENSOR` trace level).
Output slot of the tensor (applicable only in the case of the `FULL_TENSOR` trace level).
Type of the tensor value encapsulated in this proto.
Tensor value in the type described by `tensor_value_type`. This tensor may summarize the value of a single intermediate op of the graph, or those of multiple intermediate tensors.
Name of the device that the op belongs to.
The creation of an op in a TensorFlow Graph (e.g., FuncGraph in TF2).
Used in:
Type of the op (e.g., "MatMul").
Name of the op (e.g., "Dense/MatMul_1").
Name of the graph that the op is a part of (if available).
Unique ID of the graph (generated by debugger). This is the ID of the immediately-enclosing graph.
Name of the device that the op is assigned to (if available).
Names of the input tensors to the op.
Number of output tensors emitted by the op.
The unique ID for code location (stack trace) of the op's creation.
Unique IDs for the output tensors of this op.
Used in: ,
If true, use control flow to schedule the activation of Recv nodes. (Currently ignored.)
Options controlling how graph is optimized.
The number of steps to run before returning a cost model detailing the memory usage and performance of each node of the graph. 0 means no cost model.
The number of steps to skip before collecting statistics for the cost model.
Annotate each Node with Op output shape data, to the extent it can be statically inferred.
Only place the subgraphs that are run, rather than the entire graph. This is useful for interactive graph building, where one might produce graphs that cannot be placed during the debugging process. In particular, it allows the client to continue work in a session after adding a node to a graph whose placement constraints are unsatisfiable.
If true, transfer float values between processes as bfloat16.
If > 0, record a timeline every this many steps. EXPERIMENTAL: This currently has no effect in MasterSession.
Options that control the type and amount of graph rewriting. Not currently configurable via the public Python API (i.e. there is no API stability guarantee if you import RewriterConfig explicitly).
Used in:
Used in:
Used in:
Protocol buffer representing a handle to a tensorflow resource. Handles are not valid across executions, but can be serialized back and forth from within a single run.
Input Node parameters of transferred graph
Destination of graph transfer
Used in:
Used in:
Used in:
Used in:
Used in:
Serialization format for histogram module in tsl/lib/histogram/histogram.h
Used in:
Parallel arrays encoding the bucket boundaries and the bucket values. bucket(i) is the count for the bucket i. The range for a bucket is: i == 0: -DBL_MAX .. bucket_limit(0) i != 0: bucket_limit(i-1) .. bucket_limit(i)
Used in:
Used in:
Defines a single job in a TensorFlow cluster.
Used in:
The name of this job.
Mapping from task ID to "hostname:port" string. If the `name` field contains "worker", and the `tasks` map contains a mapping from 7 to "example.org:2222", then the device prefix "/job:worker/task:7" will be assigned to "example.org:2222".
Defines the device filters for tasks in a job.
Used in:
The name of this job.
Mapping from task ID to task device filters.
Used in:
Must match the name of an Op.
Type of device this kernel runs on.
Names of the Op's input_/output_args that reside in host memory instead of device memory.
This allows experimental kernels to be registered for an op that won't be used unless the user specifies a "_kernel" attr with value matching this.
Prioritization of kernel amongst different devices. By default we assume priority is 0. The higher the priority the better. By default (i.e. if this is not set), we prefer GPU kernels over CPU.
Used in:
Name of an attr from the Op.
A list of values that this kernel supports for this attr. Like OpDef.AttrDef.allowed_values, except for kernels instead of Ops.
A collection of KernelDefs
Message for configuration key value. Key is structured like Unix file system, with multiple levels of directory names separated by the slash ('/') characters.
Used in: , , ,
Used in:
Used as request type in: grpc.MasterService.ListDevices
Used as field type in:
Optional: session_handle must be returned by a CreateSession call to the same master service. When session_handle is empty, the ClusterSpec provided when the master was started is used to compute the available devices. If the session_handle is provided but not recognized, an error is returned. Finally, if a valid session_handle is provided, the cluster configuration for that session is used when computing the response.
Used as response type in: grpc.MasterService.ListDevices
Used as field type in: ,
Represents a Python list.
Used in:
Used in:
Protocol buffer used for logging messages to the events file. This was theoretically used by the defunct tensorboard_logging module, which has been removed; this message is now deprecated and should not be used.
Used in:
Used in:
Note: The logging level 10 cannot be named DEBUG. Some software projects compile their C/C++ code with -DDEBUG in debug builds. So the C++ code generated from this file should not have an identifier named DEBUG.
Used in:
Out-of-band request to begin or end logging, or to retrieve logs for particular steps.
Used as request type in: grpc.WorkerService.Logging
If true, RPC logging will be enabled.
If true, RPC logging will be disabled.
If true, discard any saved logging data (for all steps).
When set, requests all saved log data pertaining to the step. Any log data retrieved is eliminated from the store and cannot be retrieved again.
Used as response type in: grpc.WorkerService.Logging
Used in:
Host name of machine that ran the benchmark.
Unique serial number of the machine.
Additional platform information.
CPU Information.
Other devices that are attached and relevant (e.g. GPUInfo).
Devices accessible to the test (e.g. as given by list_local_devices).
Used as request type in: grpc.MasterService.MakeCallable
Used as field type in:
REQUIRED: session_handle must be returned by a CreateSession call to the same master service.
Options that define the behavior of the created callable.
Unique identifier for this request. Every MakeCallableRequest must have a unique request_id, and retried MakeCallableRequest must have the same request_id. If request_id is zero, retry detection is disabled.
Used as response type in: grpc.MasterService.MakeCallable
Used as field type in:
A handle to the created callable.
Message for managing the response cache maintained on the sender side. Currently only used by the gRPC worker service.
(message has no fields)
Some of the data from AllocatorStats
Used in:
Used in:
A directory of regions in a memmapped file.
A message that describes one region of memmapped file.
Used in:
Used in:
Total virtual memory in bytes
Immediately available memory in bytes
Process-unique step id.
Name of the operation making the allocation.
Number of bytes in the allocation.
Address of the allocation.
Id of the tensor buffer being allocated, used to match to a corresponding deallocation.
Name of the allocator used.
Process-unique step id.
Name of the operation making the deallocation.
Id of the tensor buffer being deallocated, used to match to a corresponding allocation.
Name of the allocator used.
True if the deallocation is queued and will be performed later, e.g. for GPU lazy freeing of buffers.
Process-unique step id.
Handle describing the feeds and fetches of the step.
Process-unique step id.
Name of the kernel making the allocation as set in GraphDef, e.g., "affine2/weights/Assign".
Allocated tensor details.
Id of the tensor buffer being deallocated, used to match to a corresponding allocation.
Name of the allocator used.
Process-unique step id.
Name of the kernel producing an output as set in GraphDef, e.g., "affine2/weights/Assign".
Index of the output being set.
Output tensor details.
For memory tracking.
Used in:
Protocol buffer containing the following which are necessary to restart training, run inference. It can be used to serialize/de-serialize memory objects necessary for running computation in a graph when crossing the process boundary. It can be used for long term storage of graphs, cross-language execution of graphs, etc. MetaInfoDef GraphDef SaverDef CollectionDef TensorInfo SignatureDef
Used in:
GraphDef.
SaverDef.
collection_def: Map from collection name to collections. See CollectionDef section for details.
signature_def: Map from user supplied key for a signature to a single SignatureDef.
Asset file def to be used with the defined graph.
Extra information about the structure of functions and stateful objects.
Meta information regarding the graph to be exported. To be used by users of this protocol buffer to encode information regarding their meta graph.
Used in:
User specified Version string. Can be the name of the model and revision, steps this model has been trained to, etc.
A copy of the OpDefs used by the producer of this graph_def. Descriptions and Ops not used in graph_def are stripped out.
A serialized protobuf. Can be the time this meta graph is created, or modified, or name of the model.
User supplied tag(s) on the meta_graph and included graph_def. MetaGraphDefs should be tagged with their capabilities or use-cases. Examples: "train", "serve", "gpu", "tpu", etc. These tags enable loaders to access the MetaGraph(s) appropriate for a specific use-case or runtime environment.
The __version__ string of the tensorflow build used to write this graph. This will be populated by the framework, which will overwrite any user supplied value.
The __git_version__ string of the tensorflow build used to write this graph. This will be populated by the framework, which will overwrite any user supplied value.
A flag to denote whether default-valued attrs have been stripped from the nodes in this graph_def.
FunctionDef name to aliases mapping.
Used in:
Metric name
Metric value
The minimum acceptable value for the metric if specified
The maximum acceptable value for the metric if specified
A list of attr names and their values. The whole list is attached with a string name. E.g., MatMul[T=float].
Used in: , ,
A pair of tensor name and tensor values.
Used in: , , ,
Name of the tensor.
The client can populate a TensorProto using a tensorflow::Tensor`, or directly using the protobuf field accessors. The client specifies whether the returned tensor values should be filled tensor fields (float_val, int_val, etc.) or encoded in a compact form in tensor.tensor_content.
Represents Python's namedtuple.
Used in:
Records the creation of a new replay session. We record the device listing here to capture the state of the cluster.
Used in:
Used in: , ,
The name given to this operator. Used for naming inputs, logging, visualization, etc. Unique within a single GraphDef. Must match the regexp "[A-Za-z0-9.][A-Za-z0-9_>./]*".
The operation name. There may be custom parameters in attrs. Op names starting with an underscore are reserved for internal use.
Each input is "node:src_output" with "node" being a string name and "src_output" indicating which output tensor to use from "node". If "src_output" is 0 the ":0" suffix can be omitted. Regular inputs may optionally be followed by control inputs that have the format "^node".
A (possibly partial) specification for the device on which this node should be placed. The expected syntax for this string is as follows: DEVICE_SPEC ::= PARTIAL_SPEC PARTIAL_SPEC ::= ("/" CONSTRAINT) * CONSTRAINT ::= ("job:" JOB_NAME) | ("replica:" [1-9][0-9]*) | ("task:" [1-9][0-9]*) | ("device:" [A-Za-z]* ":" ([1-9][0-9]* | "*") ) Valid values for this string include: * "/job:worker/replica:0/task:1/device:GPU:3" (full specification) * "/job:worker/device:GPU:3" (partial specification) * "" (no specification) If the constraints do not resolve to a single device (or if this field is empty or not present), the runtime will attempt to choose a device automatically.
Operation-specific graph-construction-time configuration. Note that this should include all attrs defined in the corresponding OpDef, including those with a value matching the default -- this allows the default to change and makes NodeDefs easier to interpret on their own. However, if an attr with a default is not specified in this list, the default will be used. The "names" (keys) must match the regexp "[a-z][a-z0-9_]+" (and one of the names from the corresponding OpDef's attr field). The values must have a type matching the corresponding OpDef attr's type field. TODO(josh11b): Add some examples here showing best practices.
This stores debug information associated with the node.
The complete type of this node. Experimental and subject to change. Currently, the field only contains the return types of the node. That will extend in the future to contain the entire signature of the node, as a function type.
Used in:
Opaque string inserted into error messages created by the runtime. This is intended to store the list of names of the nodes from the original graph that this node was derived. For example if this node, say C, was result of a fusion of 2 nodes A and B, then 'original_node' would be {A, B}. This information can be used to map errors originating at the current node to some top level source code.
This is intended to store the list of names of the functions from the original graph that this node was derived. For example if this node, say C, was result of a fusion of node A in function FA and node B in function FB, then `original_funcs` would be {FA, FB}. If the node is in the top level graph, the `original_func` is empty. This information, with the `original_node_names` can be used to map errors originating at the current ndoe to some top level source code.
Time/size stats recorded for a single execution of a graph node.
Used in:
TODO(tucker): Use some more compact form of node identity than the full string name. Either all processes should agree on a global id (cost_id?) for each node, or we should use a hash of the name.
Output sizes recorded for a single execution of a graph node.
Used in:
Represents None.
Used in:
(message has no fields)
Used in:
Defines an operation. A NodeDef in a GraphDef specifies an Op by using the "op" field which should match the name of a OpDef. LINT.IfChange
Used in: ,
Op names starting with an underscore are reserved for internal use. Names should be CamelCase and match the regexp "[A-Z][a-zA-Z0-9>_]*".
Description of the input(s).
Description of the output(s).
Named control outputs for this operation. Useful only for composite operations (i.e. functions) which want to name different control outputs.
Optional deprecation based on GraphDef versions.
One-line human-readable description of what the Op does.
Additional, longer human-readable description of what the Op does.
True if the operation is commutative ("op(a,b) == op(b,a)" for all inputs)
If is_aggregate is true, then this operation accepts N >= 2 inputs and produces 1 output all of the same type. Should be associative and commutative, and produce output with the same shape as the input. The optimizer may replace an aggregate op taking input from multiple devices with a tree of aggregate ops that aggregate locally within each device (and possibly within groups of nearby devices) before communicating. TODO(josh11b): Implement that optimization.
for things like add
Ops are marked as stateful if their behavior depends on some state beyond their input tensors (e.g. variable reading op) or if they have a side-effect (e.g. printing or asserting ops). Equivalently, stateless ops must always produce the same output for the same input and have no side-effects. By default Ops may be moved between devices. Stateful ops should either not be moved, or should only be moved if that state can also be moved (e.g. via some sort of save / restore). Stateful ops are guaranteed to never be optimized away by Common Subexpression Elimination (CSE).
for things like variables, queue
By default, all inputs to an Op must be initialized Tensors. Ops that may initialize tensors for the first time should set this field to true, to allow the Op to take an uninitialized Tensor as input.
for Assign, etc.
Indicates whether the op implementation uses distributed communication. If True, the op is allowed to return errors for network disconnection and trigger TF network failure handling logics.
For describing inputs and outputs.
Used in:
Name for the input/output. Should match the regexp "[a-z][a-z0-9_]*".
Human readable description.
Describes the type of one or more tensors that are accepted/produced by this input/output arg. The only legal combinations are: * For a single tensor: either the "type" field is set or the "type_attr" field is set to the name of an attr with type "type". * For a sequence of tensors with the same type: the "number_attr" field will be set to the name of an attr with type "int", and either the "type" or "type_attr" field will be set as for single tensors. * For a sequence of tensors, the "type_list_attr" field will be set to the name of an attr with type "list(type)".
if specified, attr must have type "type"
if specified, attr must have type "int"
If specified, attr must have type "list(type)", and none of type, type_attr, and number_attr may be specified.
The handle data for resource inputs.
For inputs: if true, the inputs are required to be refs. By default, inputs can be either refs or non-refs. For outputs: if true, outputs are refs, otherwise they are not.
Experimental. Full type declaration for this argument. The full type specification combines type, type_attr, type_list_attr, etc. into a unified representation. This declaration may contain non-concrete types (for example, Tensor<TypeVar<'T'>> is a valid type declaration. Note: this is a transient field. The long-term aim is to represent the entire OpDef as a single type: a callable. In that context, this field is just the type of a single argument.
Description of the graph-construction-time configuration of this Op. That is to say, this describes the attr fields that will be specified in the NodeDef.
Used in:
A descriptive name for the argument. May be used, e.g. by the Python client, as a keyword argument name, and so should match the regexp "[a-z][a-z0-9_]+".
One of the type names from attr_value.proto ("string", "list(string)", "int", etc.).
A reasonable default for this attribute if the user does not supply a value. If not specified, the user must supply a value.
Human-readable description.
For type == "int", this is a minimum value. For "list(___)" types, this is the minimum length.
The set of allowed values. Has type that is the "list" version of the "type" field above (uses the "list" field of AttrValue). If type == "type" or "list(type)" above, then the "type" field of "allowed_values.list" has the set of allowed DataTypes. If type == "string" or "list(string)", then the "s" field of "allowed_values.list" has the set of allowed strings.
Information about version-dependent deprecation of an op
Used in:
First GraphDef version at which the op is disallowed.
Explanation of why it was deprecated and what to use instead.
Description of an operation as well as the parameters expected to impact its performance.
Used in:
The operation name. There may be custom parameters in attrs.
Custom parameters impacting the behavior of the op.
Optional description of the op outputs
Device on which the operation is run.
Information about the session configs.
Input data types, shapes and values if known.
Used in:
A collection of OpDefs
Used in:
Performance data for tensorflow operations
Used in:
The op
Information about the session configs.
The node name (optional). Makes it easier to associate the performance data with a specific graph node.
Temporary memory used by this node (in bytes).
Time it takes to run the op (in nanoseconds).
Analytical compute cost (in nanoseconds).
Analytical memory access cost (in nanoseconds).
Percentage of theoretical compute performance.
Percentage of theoretical memory performance.
Expected execution time, modeled using one of 2 possible distributions.
Memory usage data for a tensorflow operation.
Used in:
The output information may have memory usage and output shapes.
Temp and persistent memory allocated by this node.
A collection of OpPerformance data points.
Options passed to the graph optimizer
Used in:
If true, optimize the graph using common subexpression elimination. Note: the optimization Level L1 will override this setting to true. So in order to disable common subexpression elimination the opt_level has to be set to L0.
If true, perform constant folding optimization on the graph. Note: the optimization Level L1 will override this setting to true. So in order to disable constant folding the opt_level has to be set to L0.
Constant folding optimization replaces tensors whose values can be predetermined, with constant nodes. To avoid inserting too large constants, the size of each constant created can be limited. If this value is zero, a default limit of 10 MiB will be applied. If constant folding optimization is disabled, this value is ignored.
If true, perform function inlining on the graph.
Overall optimization level. The actual optimizations applied will be the logical OR of the flags that this level implies and any flags already set.
CPU code will be autoclustered only if global_jit_level >= ON_1 and either: - this flag is true, or - TF_XLA_FLAGS contains --tf_xla_cpu_global_jit=true.
Control the use of the compiler/jit. Experimental.
Used in: ,
Default setting ("off" now, but later expected to be "on")
The following settings turn on compilation, with higher values being more aggressive. Higher values may reduce opportunities for parallelism and may use more memory. (At present, there is no distinction, but this is expected to change.)
Optimization level
Used in:
L1 is the default level. Optimization performed at L1 : 1. Common subexpression elimination 2. Constant folding
No optimizations
Represents a (key, value) pair.
Used in:
Used as request type in: grpc.MasterService.PartialRunSetup
Used as field type in:
REQUIRED: session_handle must be returned by a CreateSession call to the same master service.
Tensors to be fed in future steps.
Fetches. A list of tensor names. The caller expects a tensor to be returned for each fetch[i] (see RunStepResponse.tensor), for corresponding partial RunStepRequests. The order of specified fetches does not change the execution order.
Target Nodes. A list of node names. The named nodes will be run in future steps, but their outputs will not be fetched.
Unique identifier for this request. Every PartialRunSetupRequest must have a unique request_id, and retried PartialRunSetupRequest must have the same request_id. If request_id is zero, retry detection is disabled.
Used as response type in: grpc.MasterService.PartialRunSetup
Used as field type in:
The unique handle corresponding to the ongoing partial run call setup by the invocation to PartialRunSetup. This handle may be passed to RunStepRequest to send and receive tensors for this partial run.
Used in:
e.g. '64bit'
e.g. 'ELF'
e.g. 'i386'
e.g. '3.13.0-76-generic'
e.g. 'Linux'
e.g. '#120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016'
Next ID: 11
Used in: ,
Some default value of option are not proto3 default value. Use this version to determine if we should use default option value instead of proto3 default value.
Device type to profile/trace: (version >= 1) DeviceType::UNSPECIFIED: All registered device profiler will be enabled. DeviceType::CPU: only CPU will be profiled. DeviceType::GPU: only CPU/GPU will be profiled. DeviceType::TPU: only CPU/TPU will be profiled. DeviceType::PLUGGABLE_DEVICE: only CPU/pluggable devices with profilers will be profiled.
We don't collect the dataset ops by default for better trace-viewer scalability. The caller can mannually set this field to include the ops.
Levels of host tracing: (version >= 1) - Level 0 is used to disable host traces. - Level 1 enables tracing of only user instrumented (or default) TraceMe. - Level 2 enables tracing of all level 1 TraceMe(s) and instrumented high level program execution details (expensive TF ops, XLA ops, etc). This is the default. - Level 3 enables tracing of all level 2 TraceMe(s) and more verbose (low-level) program execution details (cheap TF ops, etc).
Levels of device tracing: (version >= 1) - Level 0 is used to disable device traces. - Level 1 is used to enable device traces. - More levels might be defined for specific device for controlling the verbosity of the trace.
Whether enable python function calls tracing. Runtime overhead ensues if enabled. Default off. (version >= 1)
Whether serialize hlo_proto when XLA is used. (version >= 1)
The local profiler starts profiling at this Unix timestamp in nanoseconds.
The local profiler collects `duration_ms` milliseconds of data. If the value is 0, profiling continues until interrupted.
Directory to save profile data to. No-op when empty.
Used in:
Next-ID: 9
Used as request type in: ProfilerService.Profile
Used as field type in:
In future, the caller will be able to customize when profiling starts and stops. For now, it collects `duration_ms` milliseconds worth of data.
The maximum number of events to return. By default (value 0), return all events.
Required profiling tools name such as "input_pipeline_analyzer" etc
Specifies the requirement for each tools.
Optional profiling options that control how a TF session will be profiled.
The place where we will dump profile data. We will normally use MODEL_DIR/plugins/profile/ as the repository root.
The user provided profile session identifier.
The hostname of system where the profile should happen. We use it as identifier in part of our output filename.
Used in:
Which tool data is available for consumption.
Used in:
The file name which this data is associated (e.g. "input_pipeline.json", "cluster_xxx.memory_viewer.json").
The data payload (likely json) for the specific tool.
Used in:
Type of profiling responses.
Percentage of time when device is idle.
TPU matrix unit utilization percentage.
Average step time in millisecond.
Minimum step time in millisecond.
Maximum step time in millisecond.
Average infeed percentage.
Minimum infeed percentage.
Maximum infeed percentage.
Represents the different types of responses from the profiling service.
Used in:
No result is returned from the profiling service.
Only device utilization is available.
Both device utilization and device idle time are available.
Device utilization, device idle time, step time, and infeed percentage are all available.
Protocol buffer representing a QueueRunner.
Queue name.
A list of enqueue operations.
The operation to run to close the queue.
The operation to run to cancel the queue.
A list of exception types considered to signal a safely closed queue if raised during enqueue operations.
Used in:
If true, always use RPC to contact the session target. If false (the default option), TensorFlow may use an optimized transport for client-master communication that avoids the RPC stack. This option is primarily for used testing the RPC stack.
The compression algorithm to be used. One of "deflate", "gzip".
If compression_algorithm is set, the compression level to be used. From 0 (no compression), up to 3.
Setting cache_rpc_response to true will enable sender side caching of response for RecvTensorAsync and RecvBufAsync to allow receiver to retry requests . This is only necessary when the network fabric is experiencing a significant error rate. Without it we'll fail a step on an network error, while with it we'll be able to complete long steps (like complex initializations) in the face of some network errors during RecvTensor.
Disables TCP connection sharing when opening a new RPC channel.
Setting num_channels_per_target > 0 allows uses of multiple channels to communicate to the same target. This can be used to improve the aggregate throughput on high speed links (e.g 100G) where single connection is not sufficient to maximize link utilization. Note that a single RPC only goes on a single channel, this only helps in situations where there are multiple transfers to the same target overlapping in time.
For serializing and restoring the state of ReaderBase, see reader_base.h for details.
Use of the fields below may vary by implementation. For example the buf_ptr and num_bytes may be set only for local operations and not sent on the wire, or only sent on the wire in one direction.
Used as request type in: grpc.WorkerService.RecvBuf
Used at server side to find the correct BufRendezvous.
Arbitrary string identifying a BufRendezvous entry.
Size of value expected, must agree with BufRendezvous entry.
When RDMA is in use, address of destination field on client.
Optional information on client-side device locality.
Optional information on server-side device locality.
Optional, implementation-specific data.
For annotating timeline and device incarnation check.
Optional, for annotating the timeline.
Depending on the RPC system in use, it may be necessary to set this id to detect resends of RPCs where the server is not aware that the prior RPC failed.
Incarnation number of the source device, used to detect worker failures.
Extra data needed on a non-RDMA RecvBufResponse.
Use of the fields below may vary by implementation. Comments give intended use.
Used as response type in: grpc.WorkerService.RecvBuf
Address of source field on server.
Byte length of buf_ptr field, if set.
True if value is 'dead' like a tensor.
Optional, implementation-specific data.
Optional, for timeline.
Whether the receiver should send a MarkRecvFinishedRequest to the sender to ack the message.
Used as request type in: grpc.WorkerService.RecvTensor
The step in which the tensor will be produced. REQUIRED: This must eventually correspond to the `step_id` passed into a RunGraph call on the same WorkerService.
A key identifying the channel to receive tensors from. A RecvTensor request retrieves one tensor from the channel, but multiple tensors can be sent and received over the same channel with multiple RecvTensor requests. See rendezvous.h for details.
If true, use an out-of-band DMA mechanism to transfer the received tensor.
Optional information on client-side device locality.
Optional information on server-side device locality.
Optional information needed by the RPC subsystem.
Unique identifier for this request. Every RecvTensorRequest must have a unique request_id, and retried RecvTensorRequests must have the same request_id. If request_id is zero, retry detection and response cache are disabled. Retried RecvTensorRequests are problematic because a RecvTensor with no corresponding sender will wait forever, and the tensor may have been delivered to a previous retry. Workers use request_ids to reject retried RecvTensor requests instead of waiting forever.
Used as response type in: grpc.WorkerService.RecvTensor
The tensor as a proto.
If true, this tensor was the output of a dead node, and the content is invalid.
The time at which tensor was available and started to be returned.
Optional additional information about how to receive the tensor, e.g. in the event that `RecvTensorRequest.dma_ok` was true.
Whether the receiver should send a MarkRecvFinishedRequest to the sender to ack the message.
Used as request type in: grpc.WorkerService.RegisterGraph
Subgraphs are scoped within one session.
Set to true if `CreateWorkerSession` was called for `session_handle`.
"graph_def" has the subgraph of nodes for this worker, with each node having its device_name filled in.
True iff the graph (before partitioning) contains control flow nodes. As of 01/11/2015, this is no longer set by clients.
Configuration options for the session in which this graph was created.
Field(s) used by TensorFlow Debugger (tfdbg).
If graph_def contains any collective ops this must be a positive integer used to coordinate execution with other graphs. All graphs in a distributed execution with the same collective_graph_key will coordinate to use the same step_id concurrently so that BufRendezvous entries will make the correct values accessible.
ConfigProto from the session in which this graph was created. Contains additional parameters beyond graph_options, including the name of the requested executor.
Used as response type in: grpc.WorkerService.RegisterGraph
If the registration succeeds, returns an opaque graph_handle to the master. The master calls RunGraph with graph_handle to compute different steps.
RegisteredGradient stores a gradient function that is registered in the gradients library and used in the ops of a function in the function library. Unlike GradientDef, these gradients are identified by op type, and not directly linked to any function.
Used in:
The gradient function's name.
The gradient function's registered op type.
Used in:
The name of the registered saver/restore function.
Unique auto-generated name of the object.
Used as request type in: grpc.MasterService.ReleaseCallable
Used as field type in:
REQUIRED: session_handle must be returned by a CreateSession call to the same master service.
REQUIRED: handle must be returned by a MakeCallable call to the same master service.
Used as response type in: grpc.MasterService.ReleaseCallable
Used as field type in:
(message has no fields)
Options for remote profiler session manager. Next ID: 6
Options for each local profiler.
List of servers to profile. Supported formats: host:port.
Unix timestamp of when the session was started.
Maximum time (in milliseconds) a profiling session manager waits for all profilers to finish after issuing gRPC request. If value is 0, session continues until interrupted. Otherwise, value must be greater than profiler_options.duration_ms.
Start of profiling is delayed by this much (in milliseconds).
Used in:
Reset() allows misbehaving or slow sessions to be aborted and closed, and causes their resources eventually to be released. Reset() does not wait for the computations in old sessions to cease; it merely starts the process of tearing them down. However, if a new session is started after a Reset(), the new session is isolated from changes that old sessions (started prior to the Reset()) may continue to make to resources, provided all those resources are in containers listed in "containers". Old sessions may continue to have side-effects on resources not in containers listed in "containers", and thus may affect future sessions' results in ways that are hard to predict. Thus, if well-defined behavior is desired, is it recommended that all containers be listed in "containers". Similarly, if a device_filter is specified, results may be hard to predict.
Used as request type in: grpc.MasterService.Reset
Used as field type in:
A list of container names, which may be empty. If 'container' is not empty, releases resources in the given containers in all devices. If 'container' is empty, releases resources in the default container in all devices.
When any filters are present, only devices that match the filters will be reset. Each filter can be partially specified, e.g. "/job:ps" "/job:worker/replica:3", etc.
Used as response type in: grpc.MasterService.Reset
Used as field type in:
(message has no fields)
Protocol buffer representing a handle to a tensorflow resource. Handles are not valid across executions, but can be serialized back and forth from within a single run.
Used in:
Unique name for the device containing the resource.
Container in which this resource is placed.
Unique name of this resource.
Hash code for the type of the resource. Is only valid in the same device and in the same execution.
For debug-only, the name of the type pointed to by this handle, if available.
Data types and shapes for the underlying resource.
Protocol buffer representing a pair of (data type, tensor shape).
Used in: ,
Graph rewriting is experimental and subject to change, not covered by any API stability guarantees.
Used in:
CPU Conversion settings between NHCW and NCHW.
Optimize tensor layouts (default is ON) e.g. This will try to use NCHW layout on GPU which is faster.
Fold constants (default is ON) Statically infer the value of tensors when possible, and materialize the result using constants.
Shape optimizations (default is ON) Simplify computations made on shapes.
Remapping (default is ON) Remap subgraphs onto more efficient implementations.
Common subgraph elimination (default is ON) e.g. Simplify arithmetic ops; merge ops with same value (like constants).
Arithmetic optimizations (default is ON) e.g. Simplify arithmetic ops; merge ops with same value (like constants).
Control dependency optimizations (default is ON). Remove redundant control dependencies, which may enable other optimization.
Loop optimizations (default is ON).
Function optimizations (default is ON).
Strips debug-related nodes from the graph (off by default).
If true, don't remove unnecessary ops from the graph
Try to allocate some independent Op outputs contiguously in order to merge or eliminate downstream Ops (off by default).
Force small ops onto the CPU (default is OFF).
Enable the swap of kernel implementations based on the device placement (default is ON).
Optimize data types for CUDA (default is OFF). This will try to use float16 on GPU which is faster. Note that this can change the numerical stability of the graph and may require the use of loss scaling to maintain model convergence.
Optimize data types for oneDNN (default is OFF). This will try to use bfloat16 on CPUs, which is faster. Note that this can change the numerical stability of the graph. Note: this is deprecated. It is replaced by auto_mixed_precision_onednn_bfloat16
Optimize data types for oneDNN (default is OFF). This will try to use bfloat16 on CPUs, which is faster. Note that this can change the numerical stability of the graph. Note: this is equivalent to the deprecated option auto_mixed_precision_mkl
Emulate a model using data type float16 on CPU (default is OFF). This will try to emulate the float16 inputs and outputs of an operator on CPU to have better correlation with float16 on GPU; however the computation in the operator is based on float32. Note that this can change the numerical stability of the graph.
Disable the entire meta optimizer (off by default).
Optimizers registered by plugin (default is ON)
Conditional code motion (default is ON).
Controls how many times we run the optimizers in meta optimizer (default is once).
The minimum number of nodes in a graph to optimizer. For smaller graphs, optimization is skipped. 0 means the system picks an appropriate number. < 0 means do not skip optimization.
Disable optimizations that assume compressed tensors. Note that this flag is experimental and may be removed in the future.
Disable folding quantization emulation ops such as FakeQuantWithMinMax* and QuantizeAndDequantize*. Some compilers (e.g. the TF-to-tflite converter) have to extract quantization configs (e.g. min/max range, number of bits, and per-channel) from the quantization emulation ops. Note that this flag is experimental and may be removed in the future. See b/174138564 for more details.
Configures memory optimization passes through the meta-optimizer. Has no effect on manually requested memory optimization passes in the optimizers field.
A node name scope for node names which are valid outputs of recomputations. Inputs to nodes that match this scope may be recomputed (subject either to manual annotation of those input nodes or to manual annotation and heuristics depending on memory_optimization), but the nodes themselves will not be recomputed. This matches any sub-scopes as well, meaning the scope can appear not just as a top-level scope. For example, if the value is "gradients/", the default, it will match node name "gradients/foo", "foo/gradients/bar", but not "foo_gradients/"
Maximum number of milliseconds to spend optimizing a single graph before timing out. If less than or equal to 0 (default value) the optimizer will never time out.
Configures AutoParallel optimization passes either through the meta-optimizer or when manually specified through the optimizers field.
If true, any optimization pass failing will cause the MetaOptimizer to stop with an error. By default - or when set to false, failing passes are skipped silently.
If non-empty, will use this as an alternative way to specify a list of optimizations to turn on and the order of the optimizations (replacing the meta-optimizer). Of the RewriterConfig options, only the AutoParallel configuration options (the auto_parallel field) apply to manually requested optimization passes ("autoparallel"). Memory optimization passes ("memory") invoked here are not configurable (in contrast to memory optimization passes through the meta-optimizer) and act only on manual op annotations. Custom optimizers (see custom_optimizers) that are not part of this schedule will be run after - in the order that they were specified.
list of CustomGraphOptimizers to apply.
VerifierConfig specifying the verifiers to be run after every optimizer.
VerifierConfig specifying the verifiers to be run at the end, after all optimizers have run.
Enum for layout conversion between NCHW and NHWC on CPU. Default is OFF.
Used in:
Message to describe custom graph optimizer and its parameters
Used in:
Used in:
The default setting (SCHEDULING and SWAPPING HEURISTICS only)
Disabled in the meta-optimizer.
Driven by manual op-level annotations.
Swapping heuristic will move a tensor from the GPU to the CPU and move it back when needed to reduce peak memory usage.
Recomputation heuristics will recompute ops (such as Relu activation) during backprop instead of storing them, reducing peak memory usage.
Scheduling will split big ops such as AddN and try to enforce a schedule of the new computations that decreases peak memory usage.
Use any combination of swapping and recomputation heuristics.
Enum controlling the number of times to run optimizers. The default is to run them twice.
Used in:
Used in:
Enable some aggressive optimizations that use assumptions that TF graphs may break. For example, assume the shape of a placeholder matches its actual feed.
Run MLIR pass if there's one implemented in TFG, do nothing otherwise. I.e., if there's no corresponding TFG pass, it's an OFF. This is supposed to be mapped with `ON` and there's no `AGGRESSIVE` in MLIR pass now.
Run both MLIR and Grappler passes consecutively and MLIR pass will come first.
Used as request type in: grpc.MasterService.RunCallable
Used as field type in:
REQUIRED: session_handle must be returned by a CreateSession call to the same master service.
REQUIRED: handle must be returned by a MakeCallable call to the same master service.
Values of the tensors passed as arguments to the callable, in the order defined in the CallableOptions.feed field passed to MakeCallable.
Unique identifier for this request. Every RunCallableRequest must have a unique request_id, and retried RunCallableRequest must have the same request_id. If request_id is zero, retry detection is disabled.
Used as response type in: grpc.MasterService.RunCallable
Used as field type in:
Values of the tensors returned by the callable, in the order defined in the CallableOptions.fetch field passed to MakeCallable.
Returned metadata if requested in the options.
Run-specific items such as arguments to the test / benchmark.
Used in:
Environment variables used to run the test/benchmark.
Used as request type in: grpc.WorkerService.RunGraph
session_handle is the master-generated unique id for this session. If session_handle is non-empty, it must be the same as used when registering the graph. If it is empty, a single global namespace is used to search for the graph_handle.
Set to true if `CreateWorkerSession` was called for `session_handle`.
REQUIRED: graph_handle must be returned by a RegisterGraph call to the same WorkerService.
A unique ID to distinguish different runs of the same graph. The master generates a global unique `step_id` to distinguish different runs of the graph computation. Subgraphs communicate (e.g., send/recv ops) with each other using `step_id` to distinguish tensors generated by different runs.
Options for this step.
Runs the graph. Sends the tensors in "send" into the graph before the run and fetches the keys into `RunGraphResponse.recv` after the run.
True if the RunGraphRequest is a partial run request.
True if this is the last partial run request in a sequence of requests.
If true then some errors, e.g., execution errors that have long error messages, may return an OK RunGraphResponse with the actual error saved in the status_code/status_error_message fields of the response body. This is a workaround since the RPC subsystem may truncate long metadata messages.
Unique identifier for this request. Every RunGraphRequest must have a unique request_id, and retried RunGraphRequests must have the same request_id. If request_id is zero, retry detection is disabled. Retried RunGraphRequests are problematic because they may issue a RecvTensor that will have no corresponding sender and will wait forever. Workers use request_ids to reject retried RunGraph requests instead of waiting forever.
Used as response type in: grpc.WorkerService.RunGraph
A list of tensors corresponding to those requested by `RunGraphRequest.recv_key`.
If the request asked for execution stats, the cost graph, or the partition graphs, these are returned here. TODO(suharshs): Package these in a RunMetadata instead.
If store_errors_in_response_body is true in the request, then optionally the server may return an OK status for the RPC and fill the true status into the fields below, to allow for messages that are too long to fit in metadata.
Metadata output (i.e., non-Tensor) for a single Run() call.
Used in: ,
Statistics traced for this step. Populated if tracing is turned on via the "RunOptions" proto. EXPERIMENTAL: The format and set of events may change in future versions.
The cost graph for the computation defined by the run call.
Graphs of the partitions executed by executors.
This is only populated for graphs that are run as functions in TensorFlow V2. There will be an entry below for each function that is traced. The main use cases of the post_optimization_graph and the partition_graphs is to give the caller insight into the graphs that were actually run by the runtime. Additional information (such as those in step_stats) will match these graphs. We also include the pre_optimization_graph since it is usually easier to read, and is helpful in situations where the caller wants to get a high level idea of what the built graph looks like (since the various graph optimization passes might change the structure of the graph significantly).
Metadata about the session.
Used in:
TODO(nareshmodi): Include some sort of function/cache-key identifier?
Options for a single Run() call.
Used in: ,
Time to wait for operation to complete in milliseconds.
The thread pool to use, if session_inter_op_thread_pool is configured. To use the caller thread set this to -1 - this uses the caller thread to execute Session::Run() and thus avoids a context switch. Using the caller thread to execute Session::Run() should be done ONLY for simple graphs, where the overhead of an additional context switch is comparable with the overhead of Session::Run().
Whether the partition graph(s) executed by the executor(s) should be outputted via RunMetadata.
EXPERIMENTAL. Options used to initialize DebuggerState, if enabled.
When enabled, causes tensor allocation information to be included in the error message when the Run() call fails because the allocator ran out of memory (OOM). Enabling this option can slow down the Run() call.
Everything inside Experimental is subject to change and is not subject to API stability guarantees in https://www.tensorflow.org/guide/version_compat.
Used in:
If non-zero, declares that this graph is going to use collective ops and must synchronize step_ids with any other graph with this same group_key value (in a distributed computation where tasks run disjoint graphs).
If true, then operations (using the inter-op pool) across all session::run() calls will be centrally scheduled, optimizing for (median and tail) latency. Consider using this option for CPU-bound workloads like inference.
Options for run handler thread pool.
Used in:
Priority of the request. The run handler thread pool will schedule ops based on the priority number. The larger number means higher priority.
TODO(pbar) Turn this into a TraceOptions proto which allows tracing to be controlled in a more orthogonal manner?
Used in:
Used as request type in: grpc.MasterService.RunStep
Used as field type in:
REQUIRED: session_handle must be returned by a CreateSession call to the same master service.
Tensors to be fed in the step. Each feed is a named tensor.
Fetches. A list of tensor names. The caller expects a tensor to be returned for each fetch[i] (see RunStepResponse.tensor). The order of specified fetches does not change the execution order.
Target Nodes. A list of node names. The named nodes will be run to but their outputs will not be fetched.
Options for the run call.
Partial run handle (optional). If specified, this will be a partial run execution, run up to the specified fetches.
If true then some errors, e.g., execution errors that have long error messages, may return an OK RunStepResponse with the actual error saved in the status_code/status_error_message fields of the response body. This is a workaround since the RPC subsystem may truncate long metadata messages.
Unique identifier for this request. Every RunStepRequest must have a unique request_id, and retried RunStepRequest must have the same request_id. If request_id is zero, retry detection is disabled.
Used as response type in: grpc.MasterService.RunStep
Used as field type in:
NOTE: The order of the returned tensors may or may not match the fetch order specified in RunStepRequest.
Returned metadata if requested in the options.
If store_errors_in_response_body is true in the request, then optionally the server may return an OK status for the RPC and fill the true status into the fields below, to allow for messages that are too long to fit in metadata.
Used in:
Name of the full variable of which this is a slice.
Shape of the full variable.
Offset of this variable into the full variable.
Shape of this variable.
Used in:
Node ids of concrete functions for saving and loading from a checkpoint. These functions save and restore directly from tensors.
A SavedAsset points to an asset in the MetaGraph. When bound to a function this object evaluates to a tensor with the absolute filename. Users should not depend on a particular part of the filename to remain stable (e.g. basename could be changed).
Used in:
Index into `MetaGraphDef.asset_file_def[]` that describes the Asset. Only the field `AssetFileDef.filename` is used. Other fields, such as `AssetFileDef.tensor_info`, MUST be ignored.
Used in:
Identifies a SavedConcreteFunction.
A sequence of unique strings, one per Tensor argument.
The prefix of `argument_keywords` which may be identified by position.
The spec of the function that this ConcreteFunction is traced from. This allows the ConcreteFunction to be called with nest structure inputs. This field may not be populated. If this field is absent, the concrete function can only be called with flat inputs. TODO(b/169361281): support calling saved ConcreteFunction with structured inputs in C++ SavedModel API.
Stores low-level information about a concrete function. Referenced in either a SavedFunction or a SavedBareConcreteFunction.
Used in:
Input in canonicalized form that was received to create this concrete function.
Output that was the return value of this function after replacing all Tensors with TensorSpecs. This can be an arbitrary nested function and will be used to reconstruct the full structure from pure tensors.
Used in:
An Operation name for a ConstantOp in this SavedObjectGraph's MetaGraph.
A function with multiple signatures, possibly with non-Tensor arguments.
Used in:
SavedModel is the high level serialization format for TensorFlow Models. See [todo: doc links, similar to session_bundle] for more information.
The schema version of the SavedModel instance. Used for versioning when making future changes to the specification/implementation. Initial value at release will be 1.
One or more MetaGraphs.
Used in:
Objects which this object depends on: named edges in the dependency graph. Note: All kinds of SavedObject may have children, except "constant" and "captured_tensor".
Ordered list of dependencies that must be loaded before this object. SavedModel loads with the bottom-up approach, by first creating all objects (in the order defined by the dependencies), then connecting the edges.
Slot variables owned by this object. This describes the three-way (optimizer, variable, slot variable) relationship; none of the three depend on the others directly. Note: currently only valid if kind == "user_object".
Stores the functions used to save and restore this object. At most one of `saveable_objects` or `registered_saver` is defined for each SavedObject. See the comment below for the difference between SaveableObject and registered savers.
The name of the registered class of the form "{package}.{class_name}". This field is used to search for the registered class at loading time.
The user-generated proto storing metadata for this object, to be passed to the registered classes's _deserialize_from_proto method when this object is loaded from the SavedModel.
String name of the registered saver. At most one of `saveable_objects` or `registered_saver` is defined for each SavedObject.
Used in:
Flattened list of objects in the object graph. The position of the object in this list indicates its id. Nodes[0] is considered the root node.
Information about captures and output structures in concrete functions. Referenced from SavedBareConcreteFunction and SavedFunction.
A SavedResource represents a TF object that holds state during its lifetime. An object of this type can have a reference to a: create_resource() and an initialize() function.
Used in:
A device specification indicating a required placement for the resource creation function, e.g. "CPU". An empty string allows the user to select a device.
Saved tensor slice: it stores the name of the tensors, the slice, and the raw data.
Used in: ,
Name of the tensor that this slice belongs to. This must be identical to the name used to encode the key for this record.
Extent of the slice. Must have one entry for each of the dimension of the tensor that this slice belongs to.
The raw data of the slice is stored as a TensorProto. Only raw data are stored (we don't fill in fields such as dtype or tensor_shape).
Metadata describing the set of slices of the same tensor saved in a checkpoint file.
Used in:
Name of the tensor.
Shape of the tensor
Type of the tensor
Explicit list of slices saved in the checkpoint file.
Metadata describing the set of tensor slices saved in a checkpoint file. It is always stored at the beginning of each checkpoint file.
Used in: ,
Each SavedSliceMeta describes the slices for one tensor.
Compatibility version of this checkpoint. See core/public/version.h for version history.
Each record in a v3 checkpoint file is a serialized SavedTensorSlices message.
This is only present at the first item of each checkpoint file and serves as a table of contents, listing all the tensor slices saved in this file.
This exists in all but the first item of each checkpoint file.
A SavedUserObject is an object (in the object-oriented language of the TensorFlow program) of some user- or framework-defined class other than those handled specifically by the other kinds of SavedObjects. This object cannot be evaluated as a tensor, and therefore cannot be bound to an input of a function.
Used in:
Corresponds to a registration of the type to use in the loading program.
Version information from the producer of this SavedUserObject.
Metadata for deserializing this object. Deprecated! At the time of deprecation, Keras was the only user of this field, and its saving and loading code will be updated shortly. Please save your application-specific metadata to a separate file.
Represents a Variable that is initialized by loading the contents from the checkpoint.
Used in:
List of component variables for a distributed variable. When this field is non-empty, the SavedVariable will be assumed to be a distributed variable defined by the components listed here. This is only supported by experimental loaders at the moment.
Protocol buffer representing the configuration of a Saver.
Used in:
The name of the tensor in which to specify the filename when saving or restoring a model checkpoint.
The operation to run when saving a model checkpoint.
The operation to run when restoring a model checkpoint.
Maximum number of checkpoints to keep. If 0, no checkpoints are deleted.
Shard the save files, one per device that has Variable nodes.
How often to keep an additional checkpoint. If not specified, only the last "max_to_keep" checkpoints are kept; if specified, in addition to keeping the last "max_to_keep" checkpoints, an additional checkpoint will be kept for every n hours of training.
A version number that identifies a different on-disk checkpoint format. Usually, each subclass of BaseSaverBuilder works with a particular version/format. However, it is possible that the same builder may be upgraded to support a newer checkpoint format in the future.
Used in:
Internal legacy format.
Deprecated format: tf.Saver() which works with tensorflow::table::Table.
Current format: more efficient.
Used in:
If present, only perform optimization for these ops.
Represents a serialized tf.dtypes.Dtype
Defines the configuration of a single TensorFlow server.
Used in: , ,
The cluster of which this server is a member.
The name of the job of which this server is a member. NOTE(mrry): The `cluster` field must contain a `JobDef` with a `name` field that matches this name.
The task index of this server in its job. NOTE: The `cluster` field must contain a `JobDef` with a matching `name` and a mapping in its `tasks` field for this index.
The default configuration for sessions that run on this server.
The protocol to be used by this server. Acceptable values include: "grpc", "grpc+verbs".
The server port. If not set, then we identify the port from the job_name.
Device filters for remote tasks in the cluster. NOTE: This is an experimental feature and only effective in TensorFlow 2.x.
Description of the session when an op is run.
Used in: ,
Protocol buffer used for logging session state.
Used in:
This checkpoint_path contains both the path and filename.
Used in:
Metadata about the session. This can be used by the runtime and the Ops for debugging, monitoring, etc. The (name, version) tuple is expected to be a unique identifier for sessions within the same process. NOTE: This is currently used and propagated only by the direct session.
Used in: ,
The version is optional. If set, needs to be >= 0.
SignatureDef defines the signature of a computation supported by a TensorFlow graph. For example, a model with two loss computations, sharing a single input, might have the following signature_def map, in a MetaGraphDef message. Note that across the two SignatureDefs "loss_A" and "loss_B", the input key, output key, and method_name are identical, and will be used by system(s) that implement or rely upon this particular loss method. The output tensor names differ, demonstrating how different outputs can exist for the same method. signature_def { key: "loss_A" value { inputs { key: "input" value { name: "input:0" dtype: DT_STRING tensor_shape: ... } } outputs { key: "loss_output" value { name: "loss_output_A:0" dtype: DT_FLOAT tensor_shape: ... } } method_name: "some/package/compute_loss" } ... } signature_def { key: "loss_B" value { inputs { key: "input" value { name: "input:0" dtype: DT_STRING tensor_shape: ... } } outputs { key: "loss_output" value { name: "loss_output_B:0" dtype: DT_FLOAT tensor_shape: ... } } method_name: "some/package/compute_loss" } ... }
Used in:
Named input parameters.
Named output parameters.
Extensible method_name information enabling third-party users to mark a SignatureDef as supporting a particular method. This enables producers and consumers of SignatureDefs, e.g. a model definition library and a serving library to have a clear hand-off regarding the semantics of a computation. Note that multiple SignatureDefs in a single MetaGraphDef may have the same method_name. This is commonly used to support multi-headed computation, where a single graph computation may return multiple results.
Used in:
Content of a source file involved in the execution of the debugged TensorFlow program.
Used in:
Path to the file.
Name of the host on which the file is located.
Line-by-line content of the file.
A stack frame with ID.
Used in:
A unique ID for the stack frame: A UUID-like string.
Stack frame, i.e., a frame of a stack trace, containing information regarding the file name, line number, function name, code content of the line, and column number (if available).
Used in:
Used in: , ,
`StructuredValue` represents a dynamically typed value representing various data structures that are inspired by Python data structures typically used in TensorFlow functions as inputs and outputs. For example when saving a Layer there may be a `training` argument. If the user passes a boolean True/False, that switches between two concrete TensorFlow functions. In order to switch between them in the same way after loading the SavedModel, we need to represent "True" and "False". A more advanced example might be a function which takes a list of dictionaries mapping from strings to Tensors. In order to map from user-specified arguments `[{"a": tf.constant(1.)}, {"q": tf.constant(3.)}]` after load to the right saved TensorFlow function, we need to represent the nested structure and the strings, recording that we have a trace for anything matching `[{"a": tf.TensorSpec(None, tf.float32)}, {"q": tf.TensorSpec([], tf.float64)}]` as an example. Likewise functions may return nested structures of Tensors, for example returning a dictionary mapping from strings to Tensors. In order for the loaded function to return the same structure we need to serialize it. This is an ergonomic aid for working with loaded SavedModels, not a promise to serialize all possible function signatures. For example we do not expect to pickle generic Python objects, and ideally we'd stay language-agnostic.
Used in: , , , , , , ,
The kind of value.
Represents None.
Represents a double-precision floating-point value (a Python `float`).
Represents a signed integer value, limited to 64 bits. Larger values from Python's arbitrary-precision integers are unsupported.
Represents a string of Unicode characters stored in a Python `str`. In Python 3, this is exactly what type `str` is. In Python 2, this is the UTF-8 encoding of the characters. For strings with ASCII characters only (as often used in TensorFlow code) there is effectively no difference between the language versions. The obsolescent `unicode` type of Python 2 is not supported here.
Represents a boolean value.
Represents a TensorShape.
Represents an enum value for dtype.
Represents a value for tf.TensorSpec.
Represents a value for tf.TypeSpec.
Represents a value for tf.BoundedTensorSpec.
Represents a list of `Value`.
Represents a tuple of `Value`.
Represents a dict `Value`.
Represents Python's namedtuple.
A Summary is a set of named values to be displayed by the visualizer. Summaries are produced regularly during training, as controlled by the "summary_interval_secs" attribute of the training operation. Summaries are also produced at the end of an evaluation.
Used in:
Set of values for the summary.
Used in:
Sample rate of the audio in Hz.
Number of channels of audio.
Length of the audio in frames (samples per channel).
Encoded audio data and its associated RFC 2045 content type (e.g. "audio/wav").
Used in:
Dimensions of the image.
Valid colorspace values are 1 - grayscale 2 - grayscale + alpha 3 - RGB 4 - RGBA 5 - DIGITAL_YUV 6 - BGRA
Image data in encoded format. All image formats supported by image_codec::CoderUtil can be stored here.
Used in:
This field is deprecated and will not be set.
Tag name for the data. Used by TensorBoard plugins to organize data. Tags are often organized by scope (which contains slashes to convey hierarchy). For example: foo/bar/0
Contains metadata on the summary value such as which plugins may use it. Take note that many summary values may lack a metadata field. This is because the FileWriter only keeps a metadata object on the first summary value with a certain tag for each tag. TensorBoard then remembers which tags are associated with which plugins. This saves space.
Value associated with the tag.
Metadata associated with a series of Summary data
Hint on how plugins should process the data in this series. Supported values include "scalar", "histogram", "image", "audio"
A SummaryMetadata encapsulates information on which plugins are able to make use of a certain summary value.
Used in:
Data that associates a summary with a certain plugin.
Display name for viewing in TensorBoard.
Longform readable description of the summary sequence. Markdown supported.
Class of data stored in this time series. Required for compatibility with TensorBoard's generic data facilities (`DataProvider`, et al.). This value imposes constraints on the dtype and shape of the corresponding tensor values. See `DataClass` docs for details.
Used in:
The name of the plugin this data pertains to.
The content to store for the plugin. The best practice is for this to be a binary serialized protocol buffer.
A serialization of TPUExecutable. Only includes fields necessary to load and execute a program on a worker node.
The shapes of the inputs and outputs.
Dynamic output indices indicate which outputs have dynamic dimensions.
For each resource variable output, what was the index of the corresponding input and was it updated? The indices are sorted by input order.
The shapes of the outputs when represented as Tensors. These may not match the output_shape values because we may flatten tensors to avoid excess padding.
Optional session module for passing XLA computations between TPUCompileOp and TPUExecuteOp. This is needed to support the --xla_dump_hlo_snapshots flag.
The physical device ids assigned to the replicated cores.
Used in:
Used in:
Metadata for a data transfer between device and host.
Used in:
Channel identifier assigned by compiler and used in host commands.
Direction of the transfer operation.
Channel identifier prodided by XLA client.
Shape of the data to be transferred (including layout).
Address of the device buffer in HBM (byte offset).
Original data type for this host transfer before X64 rewrite.
If this host transfer is a splitted X64 transfer, specifies whether this transfer is for lower bits.
The name of host side command handler.
Used in:
For logging the metadata output for a single session.run() call.
Used in:
Tag name associated with this metadata.
Byte-encoded version of the `RunMetadata` proto in order to allow lazy deserialization.
Defines the device filters for a remote task.
Used in:
Defines a connection between two tensors in a `GraphDef`.
Used in:
A tensor name. The value of this tensor will be substituted for the tensor named in `to_tensor`.
A tensor name. The value of this tensor will be bound to the value of the tensor named in `from_tensor`.
Available modes for extracting debugging information from a Tensor. TODO(cais): Document the detailed column names and semantics in a separate markdown file once the implementation settles.
Used in: ,
Only records what tensors are computed, eagerly or in graphs. No information regarding the value of the tensor is available.
A minimalist health summary for float-type tensors. Contains information only about the presence/absence of pathological values including Infinity and NaN. Applicable only to float dtypes.
A concise health summary for float-type tensors. Contains more information that CURT_HEALTH. Infinity and NaN are treated differently. Applicable only to float and integer dtypes.
A detailed health summary. Contains further detailed information than `CONCISE_HEALTH`. Information about device, dtype and shape are included. Counts for various types of values (Infinity, NaN, negative, zero, positive) are included. Applicable to float, integer and boolean dtypes.
Provides full runtime shape information, up to a maximum rank, beyond which the dimension sizes are truncated.
Full numeric summary. Including device, dtype, shape, counts of various types of values (Infinity, NaN, negative, zero, positive), and summary statistics (minimum, maximum, mean and variance). Applicable to float, integer and boolean dtypes.
Full tensor value.
Reduce the elements of a tensor to a rank-1 tensor of shape [3], in which - the 1st element is -inf if any element of the tensor is -inf, or zero otherwise. - the 2nd element is +inf if any element of the tensor is +inf, or zero otherwise. - the 3rd element is nan if any element of the tensor is nan, or zero otherwise.
Used in: , ,
Data type of tensor elements
Shape of the tensor.
Information about the size and allocator used for the data
Information about a Tensor necessary for feeding or retrieval.
Used in: , ,
For dense `Tensor`s, the name of the tensor in the graph.
There are many possible encodings of sparse matrices (https://en.wikipedia.org/wiki/Sparse_matrix). Currently, TensorFlow uses only the COO encoding. This is supported and documented in the SparseTensor Python class.
Generic encoding for CompositeTensors.
The static shape should be recorded here, to the extent that it can be known in advance. In the case of a SparseTensor, this field describes the logical shape of the represented tensor (aka dense_shape).
Generic encoding for composite tensors.
Used in:
The serialized TypeSpec for the composite tensor.
A TensorInfo for each flattened component tensor.
For sparse tensors, The COO encoding stores a triple of values, indices, and shape.
Used in:
The shape of the values Tensor is [?]. Its dtype must be the dtype of the SparseTensor as a whole, given in the enclosing TensorInfo.
The indices Tensor must have dtype int64 and shape [?, ?].
The dynamic logical shape represented by the SparseTensor is recorded in the Tensor referenced here. It must have dtype int64 and shape [?].
Protocol buffer representing a tensor.
Used in: , , , , , , , , , , , , , , , , , , , , , , , , , ,
Shape of the tensor. TODO(touts): sort out the 0-rank issues.
Version number. In version 0, if the "repeated xxx" representations contain only one element, that element is repeated to fill the shape. This makes it easy to represent a constant Tensor with a single value.
Serialized raw tensor content from either Tensor::AsProtoTensorContent or memcpy in tensorflow::grpc::EncodeTensorToByteBuffer. This representation can be used for all tensor types. The purpose of this representation is to reduce serialization overhead during RPC call by avoiding serialization of many repeated small items.
DT_HALF, DT_BFLOAT16. Note that since protobuf has no int16 type, we'll have some pointless zero padding for each value here.
DT_FLOAT.
DT_DOUBLE.
DT_INT32, DT_INT16, DT_UINT16, DT_INT8, DT_UINT8.
DT_STRING
DT_COMPLEX64. scomplex_val(2*i) and scomplex_val(2*i+1) are real and imaginary parts of i-th single precision complex.
DT_INT64
DT_BOOL
DT_COMPLEX128. dcomplex_val(2*i) and dcomplex_val(2*i+1) are real and imaginary parts of i-th double precision complex.
DT_RESOURCE
DT_VARIANT
DT_UINT32
DT_UINT64
Dimensions of a tensor.
Used in: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,
Dimensions of the tensor, such as {"input", 30}, {"output", 40} for a 30 x 40 2D tensor. If an entry has size -1, this corresponds to a dimension of unknown size. The names are optional. The order of entries in "dim" matters: It indicates the layout of the values in the tensor in-memory representation. The first entry in "dim" is the outermost dimension used to layout the values, the last entry is the innermost dimension. This matches the in-memory layout of RowMajor Eigen tensors. If "dim.size()" > 0, "unknown_rank" must be false.
If true, the number of dimensions in the shape is unknown. If true, "dim.size()" must be 0.
One dimension of the tensor.
Used in:
Size of the tensor in that dimension. This value must be >= -1, but values of -1 are reserved for "unknown" shapes (values of -1 mean "unknown" dimension). Certain wrappers that work with TensorShapeProto may fail at runtime when deserializing a TensorShapeProto containing a dim value of -1.
Optional name of the tensor dimension.
Can only be interpreted if you know the corresponding TensorShape.
Used in: , ,
Extent of the slice in all tensor dimensions. Must have one entry for each of the dimension of the tensor that this slice belongs to. The order of sizes is the same as the order of dimensions in the TensorShape.
Extent of the slice in one dimension.
Either both or no attributes must be set. When no attribute is set means: All data in that dimension.
Used in:
Start index of the slice, starting at 0.
Length of the slice: if the length is missing or -1 we will interpret this as "everything in this dimension". We use "oneof" to preserve information about whether the length is present without changing the serialization format from the prior proto2 version of this proto.
A protobuf to represent tf.TensorSpec.
Used in:
Tensor Tracer Report proto gives information about the trace including: - TensorTracerConfig: version, device, num replicas, trace mode. - Graphdef, e.g., list of operations, tensors - TracedTensorDef: * Name of the tensor * Tracepoint name if provided. * Index of the tensor in the compact cache if traced. * Explanation for why the tensor is traced or not.
Tensorflow graph.
A map from tensor name to its TracedTensorDef.
The fingerprint of the TensorTracerReport (fingerprint calculation excludes this field and graphdef).
The function_name passed to the function_callback that produced this TensorTracerReport
The index of the last stack frame where the stack traces for all output operations in the graph have the same value.
List of names of output tensors of the function being traced.
Information about the number of tensors traced and skipped.
Used in:
Tensor tracer version, e.g. hostcall, outside compilation.
Traced device, CPU, TPU...
Trace mode, norm, summary, full-trace.
Number of cores, e.g. TPU cores, in the system.
Number of hosts, e.g. compute nodes in the system.
Keep submode as string for backward compatibility.
Keep num cores per host for backward compatibility.
Id of the included cores, if a subset of cores are traced.
The names of the signatures corresponding to the cache indices.
Used in:
Name of the tensor as appears in tf graph.
Cache index of the tensor. This may be different than topological index.
If trace points are provided, corresponding tracepoint name of the tensor. Trace points are placed on the edges (tensors) in the tensorflow graph, and they force tensor tracer to trace the corresponding tensor. Tracepoints can be added using the programatic interface tensor_tracer.tensor_tracepoint(tensor, trace_point_name) function. This will add a trace point with the given trace_point_name for the given tensor. If a trace_point is provided for the tensor, trace_point name will be used for the rest of the analysis instead of tensor names. One can use trace_point_name's to compare two models with arbitrary tensor names by providing the same trace point name for the tensors that are comparable.
Whether the tensor is traced or not.
Detailed explanation why the tensor is traced or not.
Detailed stack of operation
Used in:
Function names from stack
Line in stack
Filenames from stack
Line number in file from stack
Used in:
The total number of tensors in the function.
The number of traced tensors in the function.
Counts of traced tensors by op type.
The number of tensors added by Tensor Tracer.
The output of one benchmark / test run. Each run contains a list of tests or benchmarks, stored as BenchmarkEntry messages. This message should be emitted by the reporter (which runs the test / BM in a subprocess and then reads the emitted BenchmarkEntry messages; usually from a serialized json file, finally collecting them along with additional information about the test run.
The target of the run, e.g.: //tensorflow/core:kernels_adjust_contrast_op_benchmark_test
The list of tests or benchmarks in this run.
The configuration of the build (compiled opt? with cuda? any copts?)
The commit id (git hash or changelist)
The time the run started (in seconds of UTC time since Unix epoch)
The amount of time the total run took (wall time in seconds)
Machine-specific parameters (Platform and CPU info)
Run-specific parameters (arguments, etc)
Benchmark target identifier.
Used for differentiating between continuous and debug builds. Must be one of: * cbuild: results from continuous build. * presubmit: results from oneshot requests. * culprit: results from culprit finder rerun.
TensorFlow version this benchmark runs against. This can be either set to full version or just the major version.
The type of benchmark.
Used in:
Fallback for protos written before Type was introduced.
Used in: ,
Used in:
The input value might be already fixed at the compilation time. This value may or may not be present.
Used in:
Whether the buffer stores dynamically padded data: in that case, actual concrete dimensions need to be stored after the buffer.
Represent device information from different runtimes.
Used in:
Used in:
The number of threads in the pool. 0 means the system picks a value based on where this option proto is used (see the declaration of the specific field for more info).
The global name of the threadpool. If empty, then the threadpool is made and used according to the scope it's in - e.g., for a session threadpool, it is used by that session only. If non-empty, then: - a global threadpool associated with this name is looked up or created. This allows, for example, sharing one threadpool across many sessions (e.g., like the default behavior, if inter_op_parallelism_threads is not configured), but still partitioning into a large and small pool. - if the threadpool for this global_name already exists, then it is an error if the existing pool was created using a different num_threads value as is specified on this call. - threadpools created this way are never garbage collected.
Used in:
Required formats for the tool, it should be one of "json", "proto", "raw" etc. If not specified (backward compatible), use default format, i.e. most tools use json format.
Whether save the result directly to repository or pass it back to caller. Default to false for backward compatibilities.
Used in:
Length of the trace to be taken, in seconds.
If true, capture step profile locally in each worker. Currently unimplemented.
If true, capture kernel events from each worker.
If true, capture extended profiling events from TensorFlow process.
If true, capture GPU profiling events locally on each machine. Currently unimplemented.
If true, collect sampled profile events. Currently unimplemented.
Out-of-band request to configure distributed tracing.
Used as request type in: grpc.WorkerService.Tracing
Used as response type in: grpc.WorkerService.Tracing
(message has no fields)
Used in:
Objects which this object depends on.
Serialized data specific to this object.
Slot variables owned by this object.
The registered saver used to save this object. If this saver is not present when loading the checkpoint, then loading will fail.
Whether this object has checkpoint values or descendants with checkpoint values. This is computed at save time to avoid traversing the entire object graph proto when restoring (which also has to traverse the live object graph).
Used in: ,
An index into `TrackableObjectGraph.nodes`, indicating the object being referenced.
A user-provided name for the edge.
Used in:
A name for the Tensor. Simple variables have only one `SerializedTensor` named "VARIABLE_VALUE" by convention. This value may be restored on object creation as an optimization.
The full name of the variable/tensor, if applicable. Used to allow name-based loading of checkpoints which were saved using an object-based API. Should match the checkpoint key which would have been assigned by tf.train.Saver.
The generated name of the Tensor in the checkpoint.
Used in: ,
An index into `TrackableObjectGraph.nodes`, indicating the variable object this slot was created for.
The name of the slot (e.g. "m"/"v").
An index into `TrackableObjectGraph.nodes`, indicating the `Object` with the value of the slot variable.
Represents a Python tuple.
Used in:
Represents a tf.TypeSpec
Used in: , ,
The value returned by TypeSpec._serialize().
The name of the TypeSpec class. * If type_spec_class == REGISTERED_TYPE_SPEC, the TypeSpec class is the one registered under this name. For types registered outside core TensorFlow by an add-on library, that library must be loaded before this value can be deserialized by nested_structure_coder. * If type_spec_class specifies a particular TypeSpec class, this field is redundant with the type_spec_class enum, and is only used for error reporting in older binaries that do not know the tupe_spec_class enum.
The number of flat tensor components required by this TypeSpec.
Used in:
tf.SparseTensorSpec
tf.IndexedSlicesSpec
tf.RaggedTensorSpec
tf.TensorArraySpec
tf.data.DatasetSpec
IteratorSpec from data/ops/iterator_ops.py
tf.OptionalSpec
PerReplicaSpec from distribute/values.py
tf.VariableSpec
RowPartitionSpec from ragged/row_partition.py
The type registered as type_spec_class_name.
Subclasses of tf.ExtensionType
Describes the dimension numbers for Convolution op. Corresponds to ::mlir::mhlo::ConvDimensionNumbersAttr.
The dimension that represents batch in the input.
The dimension that represents features in the input.
The dimensions that represents spatial dimensions in the input. Length must be rank-2 for the tensor rank for Convolution op.
The dimension that represents input features in the kernel (rhs).
The dimension that represents output features in the kernel (rhs).
The dimensions that represents spatial dimensions in the kernel (rhs). Length must be rank-2 for the tensor rank for Convolution op.
The dimension that represents batch in the output.
The dimension that represents features in the output.
The dimensions that represents spatial dimensions in the output. Length must be rank-2 for the tensor rank for Convolution op.
Protocol buffer representing the values in ControlFlowContext.
Used in: ,
Value names that have been seen in this context.
Value names referenced by but external to this context.
Used in:
Indicates how a distributed variable will be aggregated.
Used in: ,
`NONE`: This is the default, giving an error if you use a variable-update operation with multiple replicas.
`SUM`: Add the updates across replicas.
`MEAN`: Take the arithmetic mean ("average") of the updates across replicas.
`ONLY_FIRST_REPLICA`: This is for when every replica is performing the same update, but we only want to perform the update once. Used, e.g., for the global step counter.
Protocol buffer representing a Variable.
Name of the variable tensor.
Name of the tensor holding the variable's initial value.
Name of the initializer op.
Name of the snapshot tensor.
Support for saving variables as slices of a larger variable.
Whether to represent this as a ResourceVariable.
Whether this variable should be trained.
Indicates when a distributed variable will be synced.
Indicates how a distributed variable will be aggregated.
Indicates when a distributed variable will be synced.
Used in: ,
`AUTO`: Indicates that the synchronization will be determined by the current `DistributionStrategy` (eg. With `MirroredStrategy` this would be `ON_WRITE`).
`NONE`: Indicates that there will only be one copy of the variable, so there is no need to sync.
`ON_WRITE`: Indicates that the variable will be updated across devices every time it is written.
`ON_READ`: Indicates that the variable will be aggregated across devices when it is read (eg. when checkpointing or when evaluating an op that uses the variable).
Protocol buffer representing the serialization format of DT_VARIANT tensors.
Used in:
Name of the type of objects being serialized.
Portions of the object that are not Tensors.
Tensors contained within objects being serialized.
The config for graph verifiers.
Used in:
Deadline for completion of all verification i.e. all the Toggle ON verifiers must complete execution within this time.
Perform structural validation on a tensorflow graph. Default is OFF.
Used in:
Version information for a piece of serialized data There are different types of versions for each type of data (GraphDef, etc.), but they all have the same common shape described here. Each consumer has "consumer" and "min_producer" versions (specified elsewhere). A consumer is allowed to consume this data if producer >= min_producer consumer >= min_consumer consumer not in bad_consumers
Used in: , , , , ,
The version of the code that produced this data.
Any consumer below this version is not allowed to consume this data.
Specific consumer versions which are disallowed (e.g. due to bugs).
Used in:
Protocol buffer representing a WhileContext object.
Used in:
Name of the context.
The number of iterations allowed to run in parallel.
Whether backprop is enabled for this while loop.
Whether GPU-CPU memory swap is enabled for this loop.
Name of the pivot tensor.
Name of the pivot_for_pred tensor.
Name of the pivot_for_body tensor.
List of names for exit tensors.
List of names for enter tensors.
Values and external values in control flow context.
Optional name of the maximum_iterations tensor.
Contexts contained inside this context (e.g. nested whiles).
Current health status of a worker.
Used in:
By default a worker is healthy.
Worker has been instructed to shutdown after a timeout.
Indicates the behavior of the worker when an internal error or shutdown signal is received.
Used in:
Listeners listening for auto clustering events get messages of this type. Next ID: 4
The value of GlobalJitLevel, as determined by `GetGlobalJitLevelForGraph`. This determines if global auto-clustering is enabled.
Whether --tf_xla_cpu_global_jit is enabled in TF_XLA_FLAGS.
Summarizes the results of auto-clustering a TensorFlow graph. Next ID: 5
Used in:
The number of nodes in the graph that are not inside an XLA cluster.
The number of nodes in the graph that are in an XLA cluster.
All of the XLA clusters in the TF graph.
A histogram of the TF operations that were not clustered.
Describes a single XLA cluster. Next ID: 4
Used in:
The number of nodes in the cluster.
A histogram of the TF operations in this cluster.
Represents a single element in a histogram of ops ("op" as in "TensorFlow operation"). Next ID: 3
Used in: ,
The TensorFlow operation (like MatMult, Add etc.)
The number of times this occurs.
Used in:
Listeners listening for JIT compilation events get messages of this type. Each instance of XlaJitCompilationActivity corresponds to a single compilation of a single XLA cluster. E.g. if a graph has two clusters, A and B, and A is compiled 5 times and B is compiled 2 times then we will generate 7 instances of XlaJitCompilationActivity. Next ID: 6
The number of time this cluster has been compiled.
Microseconds spent in the individual compilation being reported.
Total microseconds spent in (re-)compiling this cluster so far.
Whether a persistent compilation cache entry was used.
LINT.IfChange Used for logging situations seen in Tensorflow models being optimized that are known to not perform well with XLA. Next ID: 3
Information such as which node was the problem.
Next ID: 6
Used in:
Represents an entry in the XLA compile cache.
Used to uniqely identify this entry in its persisted representation.
The computation (HLO) that compilation was done for. It is correlated to the input TF graph so we can use it to fingerprint the compiled binary. We serialize this rather than the input graphdef because it provides a stronger guarantee over what bindings are needed between the HLO and calling TF graph.
The raw bytes of the executable.
Represents the cache key used for persistence.
Used in: