Proto commits in learning-at-home/hivemind

These 60 commits are when the Protocol Buffers files have changed:

Commit:11a0260
Author:Max Ryabinin
Committer:Max Ryabinin

Add support for quantization with bitsandbytes (#490) * Add support for quantization with bitsandbytes * Extend the compression benchmark * Add a test for blockwise compression * Add a note to README about bitsandbytes * Install bitsandbytes in tests as well * Verify outputs consistently in test_moe.py (to make the test less flaky) * Pass device="cpu" in test_background_server_identity_path This ensures that the server can actually launch in a GPU-enabled environment: otherwise initializing the CUDA context in a parent process prevents it * Filter bitsandbytes warnings (cherry picked from commit 131f82c97ea67510d552bb7a68138ad27cbfa5d4)

The documentation is generated from this commit.

Commit:c2a53d0
Author:Alexander Borzunov
Committer:Max Ryabinin

Remove libp2p handlers when ConnectionHandler, DHT, and DecentralizedAverager are shut down (#501) (cherry picked from commit 3267fc7ab5417025583ecb292ce4d4c5e00779e6)

Commit:94dcbf0
Author:Pavel Samygin
Committer:Max Ryabinin

metadata type changed to bytes (#491) Type of metadata field in Expert Request/Response changed to more native type `bytes` and some compatibility fixes are done to the tests to fit different `torch` versions (cherry picked from commit fe7a4ef042a5ef902d3c0112c76dc1015fae4a15)

Commit:8374ab5
Author:justheuristic
Committer:Max Ryabinin

Handle errors in Runtime (#489) - fix edge case where expert requests with 3.99-4MB payload would fail due to max message size (due to serialization overhead) - recover from errors in the Runtime, propagate them to the corresponding tasks - previously, a failing function would terminate the entire server - which was a major pain for me personally :) - failure to process a request will now trigger P2PHandlerError instead of P2PDaemonError (cuz it does not kill the daemon) - allow optional metadata in ExpertRequest / ExpertResponse for extendability [todo: validate it vs. @mryab ] Co-authored-by: Max Ryabinin <mryabinin0@gmail.com> Co-authored-by: Pavel Samygin <samygin@phystech.edu> (cherry picked from commit ef0b842baf28bcaf00d09ee017ebecc1040f87f0)

Commit:131f82c
Author:Max Ryabinin
Committer:GitHub

Add support for quantization with bitsandbytes (#490) * Add support for quantization with bitsandbytes * Extend the compression benchmark * Add a test for blockwise compression * Add a note to README about bitsandbytes * Install bitsandbytes in tests as well * Verify outputs consistently in test_moe.py (to make the test less flaky) * Pass device="cpu" in test_background_server_identity_path This ensures that the server can actually launch in a GPU-enabled environment: otherwise initializing the CUDA context in a parent process prevents it * Filter bitsandbytes warnings

The documentation is generated from this commit.

Commit:3267fc7
Author:Alexander Borzunov
Committer:GitHub

Remove libp2p handlers when ConnectionHandler, DHT, and DecentralizedAverager are shut down (#501)

Commit:fe7a4ef
Author:Pavel Samygin
Committer:GitHub

metadata type changed to bytes (#491) Type of metadata field in Expert Request/Response changed to more native type `bytes` and some compatibility fixes are done to the tests to fit different `torch` versions

Commit:ef0b842
Author:justheuristic
Committer:GitHub

Handle errors in Runtime (#489) - fix edge case where expert requests with 3.99-4MB payload would fail due to max message size (due to serialization overhead) - recover from errors in the Runtime, propagate them to the corresponding tasks - previously, a failing function would terminate the entire server - which was a major pain for me personally :) - failure to process a request will now trigger P2PHandlerError instead of P2PDaemonError (cuz it does not kill the daemon) - allow optional metadata in ExpertRequest / ExpertResponse for extendability [todo: validate it vs. @mryab ] Co-authored-by: Max Ryabinin <mryabinin0@gmail.com> Co-authored-by: Pavel Samygin <samygin@phystech.edu>

Commit:712e428
Author:Max Ryabinin
Committer:GitHub

Remove gRPC services and grpcio requirement (#485) * Remove services from .proto files * Remove grpcio from requirements

Commit:724cdfe
Author:GreenFatGuy
Committer:GitHub

Convert hivemind.server to libp2p backend (#470) Switch hivemind MoE from gRPC to libp2p. This allows serving experts from behind NAT / firewalls and improves performance under latency. Changes: - RemoteExpert (and MoEs) now communicate to servers via libp2p - Got rid of listen_on parameters in hivemind.Server and CLI tools - ConnectionHandlers now use load balancing for better performance (see benchmarks in the corresponding PR) - updated docs & tests Co-authored-by: Denis Mazur <denismazur8@gmail.com> Co-authored-by: Alexander Borzunov <hxrussia@gmail.com> Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>

Commit:106ae3d
Author:Pavel Samygin

Merge branch 'master' into server-p2p

Commit:97deaee
Author:Alexander Borzunov
Committer:GitHub

Generate new private key if identity file doesn't exist (#473)

Commit:5eab59b
Author:Pavel Samygin

add balanced rpc handlers for connection handler

Commit:4a9bc92
Author:justheuristic
Committer:GitHub

Implement weights as part of the allreduce protocol, not matchmaking (#384) * implement parts as part of the allreduce protocol, not matchmaking * remove metadata field from AveragingData (unused) Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com> Co-authored-by: Alexander Borzunov <hxrussia@gmail.com>

Commit:836192e
Author:Max Ryabinin
Committer:Max Ryabinin

WIP

Commit:4890a75
Author:Denis Mazur
Committer:GitHub

Optimize unary handlers with persistent connections to P2P daemon (#328) This PR implements unary handlers over a persistent daemon connection and other minor improvements.

Commit:b058e6e
Author:Denis Mazur

Merge branch 'master' into unary-handlers

Commit:96c59b9
Author:Denis Mazur

fix suggestions Co-authored-by: Denis Mazur <denismazur8@gmail.com> Co-authored-by: Alexander Borzunov <hxrussia@gmail.com>

Commit:1925723
Author:Denis Mazur

update protobufs

Commit:8322485
Author:Denis Mazur

update protobufs

Commit:0774937
Author:Alexander Borzunov
Committer:GitHub

Refactor naming and serialization for PeerIDs (#339) This PR follows #323 and does the remaining mass refactors: 1. Rename `Endpoint` to `PeerID` in averager (+ related variable names) 2. Rename the `P2P.id` field to `P2P.peer_id` (because the local peer ID is stored in the `.peer_id` fields in all other classes) 3. Serialize `PeerID`s as `bytes` instead of Base58 string 4. Remove `JoinRequest.peer_id` and `AveragingData.peer_id` fields (they duplicate `context.remote_id`) 5. Remove the `DecentralizedAveraging` gRPC interface (not used anymore)

Commit:86d01c8
Author:Denis Mazur

add cancellation support for unary handlers

Commit:58887d2
Author:Denis Mazur
Committer:GitHub

Refactor daemon control protocol (#333)

Commit:70c61c4
Author:Denis Mazur

fix nits

Commit:9c59a4d
Author:Denis Mazur

reimplement daemon connection protocol

Commit:0d67284
Author:Alexander Borzunov
Committer:GitHub

Fix deadlocks in DecentralizedAverager and MPFuture (#331) This PR does the following: 1. Fix a possible deadlock in DecentralizedAverager.rpc_join_group(). 2. Fix a possible deadlock related to corrupted MPFuture state after killing child processes. 3. Add -v flag to pytest in CI. Co-authored-by: justheuristic <justheuristic@gmail.com>

Commit:91c88f8
Author:Aleksandr Borzunov

Add comments to protobufs

Commit:868be1b
Author:Aleksandr Borzunov

Ensure group key equality in rpc_join_group()

Commit:81d115a
Author:Denis Mazur
Committer:GitHub

style: remote line Co-authored-by: Alexander Borzunov <hxrussia@gmail.com>

Commit:864cb3e
Author:Denis Mazur

pass remote id to handler context

Commit:a476e3c
Author:Denis Mazur

update call id type

Commit:03ad71a
Author:Denis Mazur

fixup: restore rpcerror proto message

Commit:3f7c2e3
Author:Denis Mazur

add unary handler support to p2p control class

Commit:fb48133
Author:Alexander Borzunov
Committer:GitHub

Implement protobuf-based stream handlers over libp2p backend (#318) This PR implements protobuf-based stream handlers over libp2p backend (including unary-stream, stream-unary, and stream-stream). Similarly to gRPC, they can be used through the Servicer interface.

Commit:4a33d1b
Author:Alexander Borzunov
Committer:GitHub

Rename Endpoint to PeerID in DHT (#313) This PR follows #296, removes importing PeerID as Endpoint in dht/{node,protocol,routing}.py and related tests, and performs a number of replacements like Endpoint -> PeerID and endpoint -> peer_id.

Commit:0be1512
Author:Alexander Borzunov
Committer:GitHub

Convert DHT to libp2p backend (#296) This PR changes DHT to operate over the p2p daemon (instead of gRPC) using libp2p PeerIDs and Multiaddrs (instead of raw IP:port endpoints). Co-authored-by: Ilya Kobelev <ilya.kobellev@gmail.com>

Commit:0a0e290
Author:justheuristic
Committer:GitHub

Add per-tensor compression, make all-reduce faster and more flexible (#272) * extract core components of all-reduce as TensorPartContainer / TensorPartition * ensure that compression happens asynchronously and with background threads (per chunk) * update AllReduceRunner for new partitioning * per-tensor compression in TensorPartContainer * minimize memory allocation (e.g. use iterator in update_tensors) * update DecentralizedAverager to use new AllReduceProtocol Tests: * test that partitioning recovers the original tensors * test partitioning edge cases (e.g. empty tensors) * test that partitioning is indeed asynchronous * test new all-reduce protocol in separate file * test asyncio utility functions * benchmark performance under limited bandwidth (see PR discussion) Co-authored-by: mponty <heapnhash@gmail.com> Co-authored-by: Max Ryabinin <mryabinin0@gmail.com> Co-authored-by: Michael Diskin <yhn112@users.noreply.github.com> Co-authored-by: Alexander Borzunov <hxrussia@gmail.com>

Commit:aea7a38
Author:MaximKsh
Committer:GitHub

Add initial support for connecting via libp2p (#238) * added hivemind.P2P that wraps go-libp2p * moved pythonic libp2p daemon bindings to hivemind.p2p * implemented add_unary/stream_handler API for p2p communication * added configuration options for NAT traversal and circuit relays * added functionality tests for hivemind.P2P Co-authored-by: Maxim Kashirin <ksh.max@gmail.com> Co-authored-by: Denis Mazur <denismazur8@gmail.com> Co-authored-by: Ilya Kobelev <ilya.kobellev@gmail.com> Co-authored-by: Alexey Bukhtiyarov <a.bukhtiyarov@yandex.ru> Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com> Co-authored-by: Max Ryabinin <mryabinin0@gmail.com> Co-authored-by: Michael Diskin <yhn112@users.noreply.github.com>

Commit:971c142
Author:Aleksandr Borzunov
Committer:GitHub

Implement authorization for a moderated Hivemind network (#255) This PR implements the protocol from #253.

Commit:0080028
Author:mponty
Committer:GitHub

Add uniform compression (#202) * Add uniform compression to 8 bit * Change lookup computation of `uint8_uniform_buckets_encode` for more stable training with int8 compression refactor `quantile_encode_approx` * Add UNIFORM_8BIT case to `test_tensor_compression` * Fix possible bug with size of lookup in `deserialize_torch_tensor`

Commit:3b5ce78
Author:MaximKsh
Committer:justheuristic

Py libp2p bindings (#193) * #183 p2p daemon pybinding * #183 rename py bindings dir, fix imports and migrate tests * #183 move pb to hivemind.proto * #183 fix p2p tests * #183 remove config.py, move constants to classes * add docstrings and minor fixes

Commit:916c3db
Author:Max Ryabinin
Committer:GitHub

Move compression-related code to hivemind.utils.compression (#213) * Move compression-related code to hivemind.utils.compression * Remove copies during deserialization, silence warning

Commit:053c7c7
Author:justheuristic
Committer:GitHub

Disentangle DecentralizedAverager components, add weights (#217) This PR changes the internal logic of DecentralizedAverager to make matchmaking code independent of allreduce and vice versa. - Matchmaking now returns GroupInfo (before: it returned AllreduceRunner) - Matchmaking no longer stores AllReduce parameters - Matchmaking no longer owns averaged_tensors - Matchmaking no longer handles load balancing - Removed group_key_seed (duplicate of group_id) - throughput and client_mode is now allgathered via data_for_gather - AllReduceRunner now accepts optional peer-wise weights - Added test for weighted averaging - Fixed a minor bug: when encountering an internal error, averager attempts to warn its groupmates. Previously, it would send warning to peers even if these peers can't accept incoming requests. This caused fabulous error messages. - load_balance_peers is now ran in executor to avoid latency issues Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>

Commit:b9c02ac
Author:Vsevolod-pl
Committer:GitHub

Add quantile compression (#182) * Implemented quantile compression * Implemented test for quantile compression * Named most of magic constants * Renamed test_vector_compression to test_tensor_compression (because functions are serialize_tensor and deserialize_tensor) * Implemented benchmark for different compression types Co-authored-by: justheuristic <justheuristic@gmail.com>

Commit:edf9327
Author:foksly
Committer:GitHub

Support client-only participants in AllReduceProtocol (#176) Resolves #147

Commit:1d1252c
Author:justheuristic
Committer:GitHub

Load state from peers in DecentralizedAverager (#154) * Implemented DecentralizedAverager.load_state_from_peers that attempts to load the training state from another averager * The donor averager is chosen in the order from latest successfully updated to earliest * The definition of state can be extended by the user (by inheriting from DecentralizedAverager) * calling __del__ on a partially created MPFuture will no longer cause error (edge case from albert) image * DecentralizedAverager now supports manually getting/setting current group bits (used as dht key) Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>

Commit:10917b2
Author:justheuristic
Committer:GitHub

Averager: update group keys after every step, infer nbits dynamically (#141)

Commit:8466d72
Author:justheuristic
Committer:GitHub

Add Averager load balancing and public endpoints (#140) * implement LP load balancing * averager now relies on DHT to get public endpoint * scipy to requirements Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>

Commit:c36b5b1
Author:justheuristic
Committer:GitHub

Add DHT peer validation, add DHT.get_visible_address, add blacklist for unresponsive peers (#137) Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>

Commit:e159605
Author:justheuristic
Committer:GitHub

Address averaging corner cases, add benchmark_averaging.py, chunk averaged tensors, fix DHTNode get (#134) Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>

Commit:eb93789
Author:justheuristic
Committer:justheuristic

Implement averaging parameters over DHT (2/3) Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>

Commit:0595f4a
Author:justheuristic
Committer:GitHub

Group AllReduce protocol (#119) This is the first part of #115 that implements averaging tensors in a (pre-determined) group of peers Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>

Commit:a59fa70
Author:justheuristic
Committer:GitHub

Use namedtuples for DHT values (#110) * add test for beam search * add tests for find_best_experts and batch_find_best_experts * storage: use named tuples * switch to namedtuples * update storage tests Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>

Commit:9f9c4ac
Author:justheuristic
Committer:GitHub

Faster beam search through DHT sub-keys (#107) * Added support for dictionary-like DHT keys * New beam search based on dictionary new prefixes * DHTNode now uses a more careful caching policy on store. If a value was rejected, the node will request new value to update its cache * Add tests for dictionary value types (storage, protocol, node) * LocalStorage is moved to a separate file and generalized for new value types * Fixed minor bug where DHTProtocol.store claimed to return None but didn't * More tests

Commit:fde83bb
Author:Vsevolod-pl
Committer:GitHub

Implemented float16 compression (#106) * Implemented float16 compression * Update hivemind/utils/grpc.py Co-authored-by: justheuristic <justheuristic@gmail.com> Co-authored-by: justheuristic <justheuristic@gmail.com>

Commit:d4d9da9
Author:Vsevolod-pl
Committer:GitHub

Tensor compression (part1) (#102) * Implemented tensor compression * Test compression argument passing * Fixed naming error * Fixed argument error * Fixed gradient error * Implemented tensor compression in RemoteExpert * Fixed typo error * Fixed TypeError: 'generator' object is not subscriptable * Test torch error fix * Implemented tensor compression in connection handler (server response) * Fixed error * CircleCI fix * Fixed order * Removed not implemented compression type * missing \n * TODO * use iterators & change schema * more reasonable schema names * typo * add compression to MoE.py * Implemented more efficient vector compression * Fixed errors in deserialize_tensors * Created test for vector compression * Fixed dtype error in deserialize_tensor * Fixed error in serialize_tensor * Fixed error in deserialize_tensor * Deleted typo * Fixed TypeError in reshape in deserialize_torch_tensor * Fixed wrong shape in deserialize_torch_tensor * Fixed error on changing tensor shape * Experimentally found out the alpha for test_vector_compression * Update hivemind/utils/grpc.py Co-authored-by: justheuristic <justheuristic@gmail.com> * Update hivemind/utils/grpc.py Co-authored-by: justheuristic <justheuristic@gmail.com> * Update hivemind/utils/grpc.py Co-authored-by: justheuristic <justheuristic@gmail.com> * Update hivemind/utils/grpc.py Co-authored-by: justheuristic <justheuristic@gmail.com> * Update hivemind/utils/grpc.py Co-authored-by: justheuristic <justheuristic@gmail.com> * Changed dtype of compressed float32 tensor to compressed_float32 * Update hivemind/server/connection_handler.py Co-authored-by: justheuristic <justheuristic@gmail.com> * Update hivemind/server/connection_handler.py Co-authored-by: justheuristic <justheuristic@gmail.com> * compressed_tensor -> serialized_tensor * Incremented version Co-authored-by: justheuristic <justheuristic@gmail.com>

Commit:e7840e3
Author:Max Ryabinin
Committer:GitHub

Compile protobuf in setup.py (#85) * Compile protobuf in setup.py * Update circleci pipelines * Update RTD pipeline * Refactor custom build_ext into install and develop

Commit:535318e
Author:justheuristic
Committer:GitHub

Simplify & explain hivemind.dht.DHT (#78) * unused import * remove loop.run_forever * * explain how DHT stores stuff (hivemind/dht/__init__.py) * implement more efficient DHT.first_k_active (use get_many) * change uid_delimiter into a DHT instance property * hivemind.DHT is now thread-safe on client side * implement all mpfuture methods * get_experts: allow returning future * rollback run_in_background * switch back to GLOBAL_EXECUTOR (tests should fail) * instantiate executor in each process to avoid os locks * rollback to minimize diff * typo * traverse_dht can now be cancelled * node.store_many and node.get_many can now be cancelled * address review by mryab@ * address review by mryab@ * address review by mryab@ * address review by mryab@ * address review by mryab@

Commit:f1565ef
Author:Max Ryabinin
Committer:GitHub

GRPC connection handlers (#61) * Add connection handlers with grpc * Implement grpc for client/server * Cache channel, get rid of warnings * Awaitable interactions with TaskPool * [personal] parallel gRPC handlers (#69) * spawn multiple connection handlers * remove reserve_port * minor: make preset_minimalistic actually care about num_batches_per_client * RemoteExpert can no longer be pickled * fix broken moe.py after changes in RemoteModuleCall * write-through * fix moe.py (broken by changes in _RemoteModuleCall * fix a bug where connection handlers set ready flag prematurely * fix wrong gradient type if not all experts survived * connection_handler now sets ready only when actually ready * create stub in a lazy manner * rollback changes to DHT * Update TODO, remove message limits * Connection is gone 🦀🦀🦀 * Cleanup * Switch to absolute imports (#70) Co-authored-by: xtinkt <ant.sinitsin@gmail.com> Co-authored-by: justheuristic <justheuristic@gmail.com>

Commit:8bded39
Author:justheuristic
Committer:GitHub

[part 2] grpc-based dht (#51) * add option to share keys new peers that *should* be sharing them (improves data availability if a lot of peers join) * * added option to NOT refresh table (new default) * initial dht crawl is no longer blocking * sphinx-friendly excape char * pass cache params to kademliaprotocol * typo * rpc congestion * rpc congestion * rpc congestion * add max concurrent rpc * add max concurrent rpc * fix bug in welcome protocol: previously dht peers always considered each other "new nodes" and sent EVERYTHING to EVERYONE on each rpc call. Now DHT nodes will only request store on .store call OR when a new peer knocks on their DHT * increase to 128 concurrent rpc * await dht traversal in bootstrap * await dht traversal in bootstrap * minor comment * rename TensorProto -> TensorDescriptor to avoid name conflicts with protobuf * add grpc requirements (break tests for now) * add grpc requirements (break tests for now) * add grpc requirements (break tests for now) * minor bugfix: always add peer to nodes requested for ping * [this breaks tests] * implement DHTProtocol via gRPC * DHTProtocol now stores bytes only (not enforcing msgpack) * add grpc requirements * update node.py to grpc DHTProtocol * reminder to implement congestion * reminder to implement congestion * format * temporary patch: adapt bulk RPCs to individual search * temporary patch: adapt bulk RPCs to individual search * remane KademliaProtocol -> DHTProtocol (rationale: no longer kademlia compliant) * semicolon * pep * pep * pep * KademliaProtocol -> DHTProtocol * update tests for new eviction policy (do not evict the same node twice) * init aio in node constructor * rename KademliaProtocol => DHTProtocol everywhere * minor sphinx formatting fix * partially update test_dht * test: typo fix * test: typo fix * update test_dht for new dht interface * compile grpc from master * compile grpc from master * compile grpc from master * add umsgpack to requirements * cache compiled grpcio * ensure umsgpack version compatibility * remove unused imports from dht folder * update schemes * update schemes * review * review * ensure_future => create_task Co-authored-by: xtinkt <ant.sinitsin@gmail.com>