Get desktop application:
View/edit binary Protocol Buffers messages
////////////////////////////////////////////////////////////////////////////// Eager Service defines a TensorFlow service that executes operations eagerly on a set of local devices, on behalf of a remote Eager executor. The service impl will keep track of the various clients and devices it has access to and allows the client to enqueue ops on any devices that it is able to access and schedule data transfers from/to any of the peers. A client can generate multiple contexts to be able to independently execute operations, but cannot share data between the two contexts. NOTE: Even though contexts generated by clients should be independent, the lower level tensorflow execution engine is not, so they might share some data (e.g. a Device's ResourceMgr). //////////////////////////////////////////////////////////////////////////////
Closes the context. No calls to other methods using the existing context ID are valid after this.
(message has no fields)
This initializes the worker, informing it about the other workers in the cluster and exchanging authentication tokens which will be used in all other RPCs to detect whether the worker has restarted.
Identifies the full cluster, and this particular worker's position within.
Whether the ops on the worker should be executed synchronously or asynchronously. By default, ops are executed synchronously.
Number of seconds to keep the context alive. If more than keep_alive_secs has passed since a particular context has been communicated with, it will be garbage collected.
This is the version for all the ops that will be enqueued by the client.
Device attributes in the cluster
The ID of the created context. This is usually a randomly generated number, that will be used to identify the context in future requests to the service. Contexts are not persisted through server restarts. This ID will be used for all future communications as well. It is essential that both ends use this ID for selecting a rendezvous to get everything to match.
The view ID of the context.
For a multi device function, if false, eagerly copy all remote inputs to the default function device; if true, lazily copy remote inputs to their target devices after function instantiation to avoid redundant copies.
List of devices that are locally accessible to the worker.
This takes a list of Execute and DeleteTensorHandle operations and enqueues (in async mode) or executes (in sync mode) them on the remote server. All outputs of ops which were not explicitly deleted with DeleteTensorHandle entries will be assumed to be alive and are usable by future calls to Enqueue.
Contexts are always created with a deadline and no RPCs within a deadline will trigger a context garbage collection. KeepAlive calls can be used to delay this. It can also be used to validate the existence of a context ID on remote eager worker. If the context is on remote worker, return the same ID and the current context view ID. This is useful for checking if the remote worker (potentially with the same task name and hostname / port) is replaced with a new process.
If the requested context_id is on the remote host, set the context view ID.
This takes an Eager operation and executes it in async mode on the remote server. Different from EnqueueRequest, ops/functions sent through this type of requests are allowed to execute in parallel and no ordering is preserved by RPC stream or executor. This request type should only be used for executing component functions. Ordering of component functions should be enforced by their corresponding main functions. The runtime ensures the following invarients for component functions (CFs) and their main functions (MFs): (1) MF1 -> MF2 ==> CF1 -> CF2 ("->" indicates order of execution); (2) MF1 || MF2 ==> CF1 || CF2 ("||" indicates possible parallel execution); (3) For CF1 and CF2 that come from the same MF, CF1 || CF2 For executing ops/main functions, use Enqueue or StreamingEnqueue instead for correct ordering.
The output indices of its parent function.
A streaming version of Enqueue. Current server implementation sends one response per received request. The benefit for using a streaming version is that subsequent requests can be sent without waiting for a response to the previous request. This synchronization is required in the regular Enqueue call because gRPC does not guarantee to preserve request order.
This updates the eager context on an existing worker when updating the set of servers in a distributed eager cluster.
Identifies the full cluster, and this particular worker's position within.
Device attributes in the cluster. If this field is empty, it indicates that this is a simple update request that only increments the cluster view ID and does not require changes to the workers it connects to.
The ID of the context to be updated. A context with the specified ID must already exist on the recepient server of this request.
The view ID of the context, which should be contiguously incremented when updating the same context.
List of devices that are locally accessible to the worker.
Takes a set of op IDs and waits until those ops are done. Returns any error in the stream so far.
Ids to wait on. If empty, wait on everything currently pending.
TODO(nareshmodi): Consider adding NodeExecStats here to be able to propagate some stats.
(message has no fields)
Cleanup the step state of a multi-device function (e.g. tensors buffered by a `Send` op but not picked up by its corresponding `Recv` op).
Used in:
Used as request type in: EagerService.Enqueue, EagerService.StreamingEnqueue
Used as response type in: EagerService.Enqueue, EagerService.StreamingEnqueue
A single operation response for every item in the request.
A proto representation of an eager operation.
Used in: ,
A unique identifier for the operation. Set by the client so that the client can uniquely identify the outputs of the scheduled operation. In the initial implementation, sending duplicate IDs has undefined behaviour, but additional constraints may be placed upon this in the future.
Control Operation IDs that will be respected when ops are re-ordered by async execution. If async execution (+ op re-ordering) is not enabled, this should have no effect.
Indicates whether the op is a component of a multi-device function.
Set when is_component_function is true. It's initially generated when we create an FunctionLibraryRuntime::Options (negative value) and used to create Rendezvous for function execution. All components of a multi-device function should use the same step id to make sure that they can communicate through Send/Recv ops.
Indicates whether the op is a function.
Used in:
Used in:
The remote executor should be able to handle either executing ops directly, or releasing any unused tensor handles, since the tensor lifetime is maintained by the client.
Takes a FunctionDef and makes it enqueable on the remote worker.
A remote executor is created to execute ops/functions asynchronously enqueued in streaming call. Request with this item type waits for pending nodes to finish on the remote executor and report status.
Used in:
`shape` and `tensor` cannot be set in the same response. Shapes of output tensors for creating remote TensorHandles.
Optional. If set, represents the output devices of a function.
Output tensors of a remote function. Set when Operation.id is invalid.
Used in:
If true, it means that function_def is produced by graph partition during multi-device function instantiation.
All necessary FunctionDefs and GradientDefs to expand `function_def`. When is_component_function is true, `function_def` could be a nested function, since some nodes in its parent's function body could be replaced with a new function by the graph optimization passes. No need to add FunctionDefs here to the function cache in EagerContext since they won't be executed as KernelAndDevices.
Used in: , ,
The ID of the operation that produced this tensor.
The index into the outputs of the operation that produced this tensor.
Device where the tensor is located. Cannot be empty. For multi-device functions, it's the default device passed to placer.
Device of the operation producing this tensor. Can be empty if the operation producing this tensor is a multi-device function.
Tensor type.
Optional data types and shapes of a remote resource variable.
Used in:
Send a packed TensorHandle to a remote worker.
Used in:
Op id of the remote packed TensorHandle.
Used in:
Used in:
Device where the tensor is produced.
Used in:
All remote tensors are identified by <Op ID, Output num>. To mimic this situation when directly sending tensors, we include an "artificial" op ID (which would have corresponded to the _Recv op when not using SendTensor).
The index within the repeated field is the output number that will help uniquely identify (along with the above op_id) the particular tensor.
The device on which the tensors should be resident.
Used in:
(message has no fields)