Proto commits in tensorflow/runtime

These 14 commits are when the Protocol Buffers files have changed:

Commit:b5e3d97
Author:TFRT team
Committer:Copybara-Service

clean up TFRT distributed_runtime PiperOrigin-RevId: 490026485

The documentation is generated from this commit.

Commit:079fe32
Author:Haoyu Zhang
Committer:Copybara-Service

Support multi-client initialization. The process typically involves 3 rounds of RPCs, with the first two rounds almost the same as in single-client initialization. 1. The leader task collect remote device info from all other tasks; 2. The leader task creates distributed contexts on all other tasks, and gets back remote ready chains from them. 3. The leader task broadcasts all remote chains to other tasks. PiperOrigin-RevId: 353711143

Commit:592aa21
Author:Ayush Dubey
Committer:Copybara-Service

Add remote op execution support. PiperOrigin-RevId: 347458002

Commit:2c51c89
Author:Haoyu Zhang
Committer:Copybara-Service

Propagate remote device information in distributed initialization. PiperOrigin-RevId: 346924443

Commit:7c7b841
Author:TFRT team
Committer:Copybara-Service

- Implement kernels for sending and receiving bytes. These kernels simply wraps the underlying fabric communicator. - Implement serialization & deserialization of DHT. PiperOrigin-RevId: 346331191

Commit:c7a5316
Author:Haoyu Zhang
Committer:Copybara-Service

Add garbage collect and keep alive mechanisms to distributed TFRT. Distributed contexts created by remote clients are subject to leaks if the remote clients are disconnected. To avoid it, we add garbage collect mechanism so that the servers will periodically check and delete stale contexts if they have been inactive for a certain period of time. The GC timeout is by default 600 seconds and configurable in ServerContextConfiguration. The last access time for a distributed context is updated automatically by any RPCs with the context_id. The client also sends out KeepAlive messages periodically to make sure the remote contexts are alive. The time interval is half of the GC timeout (i.e., by default, 300 seconds). PiperOrigin-RevId: 345709415

Commit:1afb012
Author:Bramandia Ramadhana
Committer:Copybara-Service

Adds support for remote chain handling: - Introduces RemoteChainManager which is a container for a single chain for every host - Introduces kernels that uses RemoteChainManager to get/set chain for a particular host - Introduces test kernel that creates RemoteChainManager. In production, there will be a separate kernel that would read RemoteChainManager from ExecutionContext. The latter will be on latter commit - Adds InitializeContext method to RemoteClient. In this commit, this simply initializes remote chain. There is a TODO to implement this more elaborately to initialize context. - DistributedContext now manages a set of ready chain. During initialization, it would call InitializeContext to all participants to retrieve the ready chains. - RemoteChainManager is initialized with a set of ready chains PiperOrigin-RevId: 342895156

Commit:d488a3c
Author:Bramandia Ramadhana
Committer:Copybara-Service

Changed types of fields of RemoteObjectId proto to be the right type. This was missed in some previous commit PiperOrigin-RevId: 342680061

Commit:6eff110
Author:Haoyu Zhang
Committer:Copybara-Service

Support remote context initialization in distributed TFRT. Add functions to create and close `DistributedContext`s on remote workers. This function should be used by the "master" node at the beginning/end of distributed execution. If there are errors, the done callback will be invoked with an llvm::Error (one error) or ErrorCollection (multiple errors). PiperOrigin-RevId: 341953241

Commit:8d5b096
Author:Haoyu Zhang
Committer:Copybara-Service

Define cluster configuration as protos in preparation for remote initialization. The main purposes of the proto definitions are: * Better serialization/deserialization support, as these configurations will be used as part of remote distributed context initialization; * Better cross-language support. In the future the API layers in Python should be able to directly construct these configurations for the runtime. PiperOrigin-RevId: 341691961

Commit:c996b81
Author:TFRT team
Committer:chuanhaozhuge

Internal change PiperOrigin-RevId: 340714785

Commit:bd1e47b
Author:Bramandia Ramadhana
Committer:Copybara-Service

Adds support for Remote Object deletion: - Adds DeleteRemoteObjects method to RemoteObjectManager - Adds DeleteRemoteObjectsAsync method to RemoteClient PiperOrigin-RevId: 340279592

Commit:7bffa4e
Author:Haoyu Zhang
Committer:Copybara-Service

Modularize distributed TFRT. PiperOrigin-RevId: 339919426

Commit:1bef0ea
Author:TFRT team
Committer:chuanhao

Internal change. PiperOrigin-RevId: 308308798

The documentation is generated from this commit.