These 60 commits are when the Protocol Buffers files have changed:
Commit: | 11a0260 | |
---|---|---|
Author: | Max Ryabinin | |
Committer: | Max Ryabinin |
Add support for quantization with bitsandbytes (#490) * Add support for quantization with bitsandbytes * Extend the compression benchmark * Add a test for blockwise compression * Add a note to README about bitsandbytes * Install bitsandbytes in tests as well * Verify outputs consistently in test_moe.py (to make the test less flaky) * Pass device="cpu" in test_background_server_identity_path This ensures that the server can actually launch in a GPU-enabled environment: otherwise initializing the CUDA context in a parent process prevents it * Filter bitsandbytes warnings (cherry picked from commit 131f82c97ea67510d552bb7a68138ad27cbfa5d4)
The documentation is generated from this commit.
Commit: | c2a53d0 | |
---|---|---|
Author: | Alexander Borzunov | |
Committer: | Max Ryabinin |
Remove libp2p handlers when ConnectionHandler, DHT, and DecentralizedAverager are shut down (#501) (cherry picked from commit 3267fc7ab5417025583ecb292ce4d4c5e00779e6)
Commit: | 94dcbf0 | |
---|---|---|
Author: | Pavel Samygin | |
Committer: | Max Ryabinin |
metadata type changed to bytes (#491) Type of metadata field in Expert Request/Response changed to more native type `bytes` and some compatibility fixes are done to the tests to fit different `torch` versions (cherry picked from commit fe7a4ef042a5ef902d3c0112c76dc1015fae4a15)
Commit: | 8374ab5 | |
---|---|---|
Author: | justheuristic | |
Committer: | Max Ryabinin |
Handle errors in Runtime (#489) - fix edge case where expert requests with 3.99-4MB payload would fail due to max message size (due to serialization overhead) - recover from errors in the Runtime, propagate them to the corresponding tasks - previously, a failing function would terminate the entire server - which was a major pain for me personally :) - failure to process a request will now trigger P2PHandlerError instead of P2PDaemonError (cuz it does not kill the daemon) - allow optional metadata in ExpertRequest / ExpertResponse for extendability [todo: validate it vs. @mryab ] Co-authored-by: Max Ryabinin <mryabinin0@gmail.com> Co-authored-by: Pavel Samygin <samygin@phystech.edu> (cherry picked from commit ef0b842baf28bcaf00d09ee017ebecc1040f87f0)
Commit: | 131f82c | |
---|---|---|
Author: | Max Ryabinin | |
Committer: | GitHub |
Add support for quantization with bitsandbytes (#490) * Add support for quantization with bitsandbytes * Extend the compression benchmark * Add a test for blockwise compression * Add a note to README about bitsandbytes * Install bitsandbytes in tests as well * Verify outputs consistently in test_moe.py (to make the test less flaky) * Pass device="cpu" in test_background_server_identity_path This ensures that the server can actually launch in a GPU-enabled environment: otherwise initializing the CUDA context in a parent process prevents it * Filter bitsandbytes warnings
The documentation is generated from this commit.
Commit: | 3267fc7 | |
---|---|---|
Author: | Alexander Borzunov | |
Committer: | GitHub |
Remove libp2p handlers when ConnectionHandler, DHT, and DecentralizedAverager are shut down (#501)
Commit: | fe7a4ef | |
---|---|---|
Author: | Pavel Samygin | |
Committer: | GitHub |
metadata type changed to bytes (#491) Type of metadata field in Expert Request/Response changed to more native type `bytes` and some compatibility fixes are done to the tests to fit different `torch` versions
Commit: | ef0b842 | |
---|---|---|
Author: | justheuristic | |
Committer: | GitHub |
Handle errors in Runtime (#489) - fix edge case where expert requests with 3.99-4MB payload would fail due to max message size (due to serialization overhead) - recover from errors in the Runtime, propagate them to the corresponding tasks - previously, a failing function would terminate the entire server - which was a major pain for me personally :) - failure to process a request will now trigger P2PHandlerError instead of P2PDaemonError (cuz it does not kill the daemon) - allow optional metadata in ExpertRequest / ExpertResponse for extendability [todo: validate it vs. @mryab ] Co-authored-by: Max Ryabinin <mryabinin0@gmail.com> Co-authored-by: Pavel Samygin <samygin@phystech.edu>
Commit: | 712e428 | |
---|---|---|
Author: | Max Ryabinin | |
Committer: | GitHub |
Remove gRPC services and grpcio requirement (#485) * Remove services from .proto files * Remove grpcio from requirements
Commit: | 724cdfe | |
---|---|---|
Author: | GreenFatGuy | |
Committer: | GitHub |
Convert hivemind.server to libp2p backend (#470) Switch hivemind MoE from gRPC to libp2p. This allows serving experts from behind NAT / firewalls and improves performance under latency. Changes: - RemoteExpert (and MoEs) now communicate to servers via libp2p - Got rid of listen_on parameters in hivemind.Server and CLI tools - ConnectionHandlers now use load balancing for better performance (see benchmarks in the corresponding PR) - updated docs & tests Co-authored-by: Denis Mazur <denismazur8@gmail.com> Co-authored-by: Alexander Borzunov <hxrussia@gmail.com> Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>
Commit: | 106ae3d | |
---|---|---|
Author: | Pavel Samygin |
Merge branch 'master' into server-p2p
Commit: | 97deaee | |
---|---|---|
Author: | Alexander Borzunov | |
Committer: | GitHub |
Generate new private key if identity file doesn't exist (#473)
Commit: | 5eab59b | |
---|---|---|
Author: | Pavel Samygin |
add balanced rpc handlers for connection handler
Commit: | 4a9bc92 | |
---|---|---|
Author: | justheuristic | |
Committer: | GitHub |
Implement weights as part of the allreduce protocol, not matchmaking (#384) * implement parts as part of the allreduce protocol, not matchmaking * remove metadata field from AveragingData (unused) Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com> Co-authored-by: Alexander Borzunov <hxrussia@gmail.com>
Commit: | 836192e | |
---|---|---|
Author: | Max Ryabinin | |
Committer: | Max Ryabinin |
WIP
Commit: | 4890a75 | |
---|---|---|
Author: | Denis Mazur | |
Committer: | GitHub |
Optimize unary handlers with persistent connections to P2P daemon (#328) This PR implements unary handlers over a persistent daemon connection and other minor improvements.
Commit: | b058e6e | |
---|---|---|
Author: | Denis Mazur |
Merge branch 'master' into unary-handlers
Commit: | 96c59b9 | |
---|---|---|
Author: | Denis Mazur |
fix suggestions Co-authored-by: Denis Mazur <denismazur8@gmail.com> Co-authored-by: Alexander Borzunov <hxrussia@gmail.com>
Commit: | 1925723 | |
---|---|---|
Author: | Denis Mazur |
update protobufs
Commit: | 8322485 | |
---|---|---|
Author: | Denis Mazur |
update protobufs
Commit: | 0774937 | |
---|---|---|
Author: | Alexander Borzunov | |
Committer: | GitHub |
Refactor naming and serialization for PeerIDs (#339) This PR follows #323 and does the remaining mass refactors: 1. Rename `Endpoint` to `PeerID` in averager (+ related variable names) 2. Rename the `P2P.id` field to `P2P.peer_id` (because the local peer ID is stored in the `.peer_id` fields in all other classes) 3. Serialize `PeerID`s as `bytes` instead of Base58 string 4. Remove `JoinRequest.peer_id` and `AveragingData.peer_id` fields (they duplicate `context.remote_id`) 5. Remove the `DecentralizedAveraging` gRPC interface (not used anymore)
Commit: | 86d01c8 | |
---|---|---|
Author: | Denis Mazur |
add cancellation support for unary handlers
Commit: | 58887d2 | |
---|---|---|
Author: | Denis Mazur | |
Committer: | GitHub |
Refactor daemon control protocol (#333)
Commit: | 70c61c4 | |
---|---|---|
Author: | Denis Mazur |
fix nits
Commit: | 9c59a4d | |
---|---|---|
Author: | Denis Mazur |
reimplement daemon connection protocol
Commit: | 0d67284 | |
---|---|---|
Author: | Alexander Borzunov | |
Committer: | GitHub |
Fix deadlocks in DecentralizedAverager and MPFuture (#331) This PR does the following: 1. Fix a possible deadlock in DecentralizedAverager.rpc_join_group(). 2. Fix a possible deadlock related to corrupted MPFuture state after killing child processes. 3. Add -v flag to pytest in CI. Co-authored-by: justheuristic <justheuristic@gmail.com>
Commit: | 91c88f8 | |
---|---|---|
Author: | Aleksandr Borzunov |
Add comments to protobufs
Commit: | 868be1b | |
---|---|---|
Author: | Aleksandr Borzunov |
Ensure group key equality in rpc_join_group()
Commit: | 81d115a | |
---|---|---|
Author: | Denis Mazur | |
Committer: | GitHub |
style: remote line Co-authored-by: Alexander Borzunov <hxrussia@gmail.com>
Commit: | 864cb3e | |
---|---|---|
Author: | Denis Mazur |
pass remote id to handler context
Commit: | a476e3c | |
---|---|---|
Author: | Denis Mazur |
update call id type
Commit: | 03ad71a | |
---|---|---|
Author: | Denis Mazur |
fixup: restore rpcerror proto message
Commit: | 3f7c2e3 | |
---|---|---|
Author: | Denis Mazur |
add unary handler support to p2p control class
Commit: | fb48133 | |
---|---|---|
Author: | Alexander Borzunov | |
Committer: | GitHub |
Implement protobuf-based stream handlers over libp2p backend (#318) This PR implements protobuf-based stream handlers over libp2p backend (including unary-stream, stream-unary, and stream-stream). Similarly to gRPC, they can be used through the Servicer interface.
Commit: | 4a33d1b | |
---|---|---|
Author: | Alexander Borzunov | |
Committer: | GitHub |
Rename Endpoint to PeerID in DHT (#313) This PR follows #296, removes importing PeerID as Endpoint in dht/{node,protocol,routing}.py and related tests, and performs a number of replacements like Endpoint -> PeerID and endpoint -> peer_id.
Commit: | 0be1512 | |
---|---|---|
Author: | Alexander Borzunov | |
Committer: | GitHub |
Convert DHT to libp2p backend (#296) This PR changes DHT to operate over the p2p daemon (instead of gRPC) using libp2p PeerIDs and Multiaddrs (instead of raw IP:port endpoints). Co-authored-by: Ilya Kobelev <ilya.kobellev@gmail.com>
Commit: | 0a0e290 | |
---|---|---|
Author: | justheuristic | |
Committer: | GitHub |
Add per-tensor compression, make all-reduce faster and more flexible (#272) * extract core components of all-reduce as TensorPartContainer / TensorPartition * ensure that compression happens asynchronously and with background threads (per chunk) * update AllReduceRunner for new partitioning * per-tensor compression in TensorPartContainer * minimize memory allocation (e.g. use iterator in update_tensors) * update DecentralizedAverager to use new AllReduceProtocol Tests: * test that partitioning recovers the original tensors * test partitioning edge cases (e.g. empty tensors) * test that partitioning is indeed asynchronous * test new all-reduce protocol in separate file * test asyncio utility functions * benchmark performance under limited bandwidth (see PR discussion) Co-authored-by: mponty <heapnhash@gmail.com> Co-authored-by: Max Ryabinin <mryabinin0@gmail.com> Co-authored-by: Michael Diskin <yhn112@users.noreply.github.com> Co-authored-by: Alexander Borzunov <hxrussia@gmail.com>
Commit: | aea7a38 | |
---|---|---|
Author: | MaximKsh | |
Committer: | GitHub |
Add initial support for connecting via libp2p (#238) * added hivemind.P2P that wraps go-libp2p * moved pythonic libp2p daemon bindings to hivemind.p2p * implemented add_unary/stream_handler API for p2p communication * added configuration options for NAT traversal and circuit relays * added functionality tests for hivemind.P2P Co-authored-by: Maxim Kashirin <ksh.max@gmail.com> Co-authored-by: Denis Mazur <denismazur8@gmail.com> Co-authored-by: Ilya Kobelev <ilya.kobellev@gmail.com> Co-authored-by: Alexey Bukhtiyarov <a.bukhtiyarov@yandex.ru> Co-authored-by: Alexander Borzunov <borzunov.alexander@gmail.com> Co-authored-by: Max Ryabinin <mryabinin0@gmail.com> Co-authored-by: Michael Diskin <yhn112@users.noreply.github.com>
Commit: | 971c142 | |
---|---|---|
Author: | Aleksandr Borzunov | |
Committer: | GitHub |
Implement authorization for a moderated Hivemind network (#255) This PR implements the protocol from #253.
Commit: | 0080028 | |
---|---|---|
Author: | mponty | |
Committer: | GitHub |
Add uniform compression (#202) * Add uniform compression to 8 bit * Change lookup computation of `uint8_uniform_buckets_encode` for more stable training with int8 compression refactor `quantile_encode_approx` * Add UNIFORM_8BIT case to `test_tensor_compression` * Fix possible bug with size of lookup in `deserialize_torch_tensor`
Commit: | 3b5ce78 | |
---|---|---|
Author: | MaximKsh | |
Committer: | justheuristic |
Py libp2p bindings (#193) * #183 p2p daemon pybinding * #183 rename py bindings dir, fix imports and migrate tests * #183 move pb to hivemind.proto * #183 fix p2p tests * #183 remove config.py, move constants to classes * add docstrings and minor fixes
Commit: | 916c3db | |
---|---|---|
Author: | Max Ryabinin | |
Committer: | GitHub |
Move compression-related code to hivemind.utils.compression (#213) * Move compression-related code to hivemind.utils.compression * Remove copies during deserialization, silence warning
Commit: | 053c7c7 | |
---|---|---|
Author: | justheuristic | |
Committer: | GitHub |
Disentangle DecentralizedAverager components, add weights (#217) This PR changes the internal logic of DecentralizedAverager to make matchmaking code independent of allreduce and vice versa. - Matchmaking now returns GroupInfo (before: it returned AllreduceRunner) - Matchmaking no longer stores AllReduce parameters - Matchmaking no longer owns averaged_tensors - Matchmaking no longer handles load balancing - Removed group_key_seed (duplicate of group_id) - throughput and client_mode is now allgathered via data_for_gather - AllReduceRunner now accepts optional peer-wise weights - Added test for weighted averaging - Fixed a minor bug: when encountering an internal error, averager attempts to warn its groupmates. Previously, it would send warning to peers even if these peers can't accept incoming requests. This caused fabulous error messages. - load_balance_peers is now ran in executor to avoid latency issues Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>
Commit: | b9c02ac | |
---|---|---|
Author: | Vsevolod-pl | |
Committer: | GitHub |
Add quantile compression (#182) * Implemented quantile compression * Implemented test for quantile compression * Named most of magic constants * Renamed test_vector_compression to test_tensor_compression (because functions are serialize_tensor and deserialize_tensor) * Implemented benchmark for different compression types Co-authored-by: justheuristic <justheuristic@gmail.com>
Commit: | edf9327 | |
---|---|---|
Author: | foksly | |
Committer: | GitHub |
Support client-only participants in AllReduceProtocol (#176) Resolves #147
Commit: | 1d1252c | |
---|---|---|
Author: | justheuristic | |
Committer: | GitHub |
Load state from peers in DecentralizedAverager (#154) * Implemented DecentralizedAverager.load_state_from_peers that attempts to load the training state from another averager * The donor averager is chosen in the order from latest successfully updated to earliest * The definition of state can be extended by the user (by inheriting from DecentralizedAverager) * calling __del__ on a partially created MPFuture will no longer cause error (edge case from albert) image * DecentralizedAverager now supports manually getting/setting current group bits (used as dht key) Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>
Commit: | 10917b2 | |
---|---|---|
Author: | justheuristic | |
Committer: | GitHub |
Averager: update group keys after every step, infer nbits dynamically (#141)
Commit: | 8466d72 | |
---|---|---|
Author: | justheuristic | |
Committer: | GitHub |
Add Averager load balancing and public endpoints (#140) * implement LP load balancing * averager now relies on DHT to get public endpoint * scipy to requirements Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>
Commit: | c36b5b1 | |
---|---|---|
Author: | justheuristic | |
Committer: | GitHub |
Add DHT peer validation, add DHT.get_visible_address, add blacklist for unresponsive peers (#137) Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>
Commit: | e159605 | |
---|---|---|
Author: | justheuristic | |
Committer: | GitHub |
Address averaging corner cases, add benchmark_averaging.py, chunk averaged tensors, fix DHTNode get (#134) Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>
Commit: | eb93789 | |
---|---|---|
Author: | justheuristic | |
Committer: | justheuristic |
Implement averaging parameters over DHT (2/3) Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>
Commit: | 0595f4a | |
---|---|---|
Author: | justheuristic | |
Committer: | GitHub |
Group AllReduce protocol (#119) This is the first part of #115 that implements averaging tensors in a (pre-determined) group of peers Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>
Commit: | a59fa70 | |
---|---|---|
Author: | justheuristic | |
Committer: | GitHub |
Use namedtuples for DHT values (#110) * add test for beam search * add tests for find_best_experts and batch_find_best_experts * storage: use named tuples * switch to namedtuples * update storage tests Co-authored-by: Max Ryabinin <mryabinin0@gmail.com>
Commit: | 9f9c4ac | |
---|---|---|
Author: | justheuristic | |
Committer: | GitHub |
Faster beam search through DHT sub-keys (#107) * Added support for dictionary-like DHT keys * New beam search based on dictionary new prefixes * DHTNode now uses a more careful caching policy on store. If a value was rejected, the node will request new value to update its cache * Add tests for dictionary value types (storage, protocol, node) * LocalStorage is moved to a separate file and generalized for new value types * Fixed minor bug where DHTProtocol.store claimed to return None but didn't * More tests
Commit: | fde83bb | |
---|---|---|
Author: | Vsevolod-pl | |
Committer: | GitHub |
Implemented float16 compression (#106) * Implemented float16 compression * Update hivemind/utils/grpc.py Co-authored-by: justheuristic <justheuristic@gmail.com> Co-authored-by: justheuristic <justheuristic@gmail.com>
Commit: | d4d9da9 | |
---|---|---|
Author: | Vsevolod-pl | |
Committer: | GitHub |
Tensor compression (part1) (#102) * Implemented tensor compression * Test compression argument passing * Fixed naming error * Fixed argument error * Fixed gradient error * Implemented tensor compression in RemoteExpert * Fixed typo error * Fixed TypeError: 'generator' object is not subscriptable * Test torch error fix * Implemented tensor compression in connection handler (server response) * Fixed error * CircleCI fix * Fixed order * Removed not implemented compression type * missing \n * TODO * use iterators & change schema * more reasonable schema names * typo * add compression to MoE.py * Implemented more efficient vector compression * Fixed errors in deserialize_tensors * Created test for vector compression * Fixed dtype error in deserialize_tensor * Fixed error in serialize_tensor * Fixed error in deserialize_tensor * Deleted typo * Fixed TypeError in reshape in deserialize_torch_tensor * Fixed wrong shape in deserialize_torch_tensor * Fixed error on changing tensor shape * Experimentally found out the alpha for test_vector_compression * Update hivemind/utils/grpc.py Co-authored-by: justheuristic <justheuristic@gmail.com> * Update hivemind/utils/grpc.py Co-authored-by: justheuristic <justheuristic@gmail.com> * Update hivemind/utils/grpc.py Co-authored-by: justheuristic <justheuristic@gmail.com> * Update hivemind/utils/grpc.py Co-authored-by: justheuristic <justheuristic@gmail.com> * Update hivemind/utils/grpc.py Co-authored-by: justheuristic <justheuristic@gmail.com> * Changed dtype of compressed float32 tensor to compressed_float32 * Update hivemind/server/connection_handler.py Co-authored-by: justheuristic <justheuristic@gmail.com> * Update hivemind/server/connection_handler.py Co-authored-by: justheuristic <justheuristic@gmail.com> * compressed_tensor -> serialized_tensor * Incremented version Co-authored-by: justheuristic <justheuristic@gmail.com>
Commit: | e7840e3 | |
---|---|---|
Author: | Max Ryabinin | |
Committer: | GitHub |
Compile protobuf in setup.py (#85) * Compile protobuf in setup.py * Update circleci pipelines * Update RTD pipeline * Refactor custom build_ext into install and develop
Commit: | 535318e | |
---|---|---|
Author: | justheuristic | |
Committer: | GitHub |
Simplify & explain hivemind.dht.DHT (#78) * unused import * remove loop.run_forever * * explain how DHT stores stuff (hivemind/dht/__init__.py) * implement more efficient DHT.first_k_active (use get_many) * change uid_delimiter into a DHT instance property * hivemind.DHT is now thread-safe on client side * implement all mpfuture methods * get_experts: allow returning future * rollback run_in_background * switch back to GLOBAL_EXECUTOR (tests should fail) * instantiate executor in each process to avoid os locks * rollback to minimize diff * typo * traverse_dht can now be cancelled * node.store_many and node.get_many can now be cancelled * address review by mryab@ * address review by mryab@ * address review by mryab@ * address review by mryab@ * address review by mryab@
Commit: | f1565ef | |
---|---|---|
Author: | Max Ryabinin | |
Committer: | GitHub |
GRPC connection handlers (#61) * Add connection handlers with grpc * Implement grpc for client/server * Cache channel, get rid of warnings * Awaitable interactions with TaskPool * [personal] parallel gRPC handlers (#69) * spawn multiple connection handlers * remove reserve_port * minor: make preset_minimalistic actually care about num_batches_per_client * RemoteExpert can no longer be pickled * fix broken moe.py after changes in RemoteModuleCall * write-through * fix moe.py (broken by changes in _RemoteModuleCall * fix a bug where connection handlers set ready flag prematurely * fix wrong gradient type if not all experts survived * connection_handler now sets ready only when actually ready * create stub in a lazy manner * rollback changes to DHT * Update TODO, remove message limits * Connection is gone 🦀🦀🦀 * Cleanup * Switch to absolute imports (#70) Co-authored-by: xtinkt <ant.sinitsin@gmail.com> Co-authored-by: justheuristic <justheuristic@gmail.com>
Commit: | 8bded39 | |
---|---|---|
Author: | justheuristic | |
Committer: | GitHub |
[part 2] grpc-based dht (#51) * add option to share keys new peers that *should* be sharing them (improves data availability if a lot of peers join) * * added option to NOT refresh table (new default) * initial dht crawl is no longer blocking * sphinx-friendly excape char * pass cache params to kademliaprotocol * typo * rpc congestion * rpc congestion * rpc congestion * add max concurrent rpc * add max concurrent rpc * fix bug in welcome protocol: previously dht peers always considered each other "new nodes" and sent EVERYTHING to EVERYONE on each rpc call. Now DHT nodes will only request store on .store call OR when a new peer knocks on their DHT * increase to 128 concurrent rpc * await dht traversal in bootstrap * await dht traversal in bootstrap * minor comment * rename TensorProto -> TensorDescriptor to avoid name conflicts with protobuf * add grpc requirements (break tests for now) * add grpc requirements (break tests for now) * add grpc requirements (break tests for now) * minor bugfix: always add peer to nodes requested for ping * [this breaks tests] * implement DHTProtocol via gRPC * DHTProtocol now stores bytes only (not enforcing msgpack) * add grpc requirements * update node.py to grpc DHTProtocol * reminder to implement congestion * reminder to implement congestion * format * temporary patch: adapt bulk RPCs to individual search * temporary patch: adapt bulk RPCs to individual search * remane KademliaProtocol -> DHTProtocol (rationale: no longer kademlia compliant) * semicolon * pep * pep * pep * KademliaProtocol -> DHTProtocol * update tests for new eviction policy (do not evict the same node twice) * init aio in node constructor * rename KademliaProtocol => DHTProtocol everywhere * minor sphinx formatting fix * partially update test_dht * test: typo fix * test: typo fix * update test_dht for new dht interface * compile grpc from master * compile grpc from master * compile grpc from master * add umsgpack to requirements * cache compiled grpcio * ensure umsgpack version compatibility * remove unused imports from dht folder * update schemes * update schemes * review * review * ensure_future => create_task Co-authored-by: xtinkt <ant.sinitsin@gmail.com>