Proto commits in exo-explore/exo

These 32 commits are when the Protocol Buffers files have changed:

Commit:9954ce8
Author:Alex Cheema

fix treating token as a list

The documentation is generated from this commit.

Commit:218c1e7
Author:Alex Cheema

Merge branch 'main' into runners2

Commit:5460529
Author:Pranav Veldurthi

Merge Latest

Commit:db010d5
Author:Alex Cheema

distributed tracing

Commit:c9ded9b
Author:Alex Cheema
Committer:Alex Cheema

optimise networking, remove bloat

Commit:f55a53a
Author:Alex Cheema
Committer:Alex Cheema

one token at a time

Commit:f5efbe1
Author:Nel Nibcord
Committer:Nel Nibcord

Initial distributed evaluation implementation

Commit:9283f6d
Author:Nel Nibcord
Committer:Nel Nibcord

Correct loss propagation so we can see the actual loss instead of just the requestor shard's loss

Commit:75c8650
Author:Nel Nibcord
Committer:Nel Nibcord

Naive network-propagated loss implementation on MLX

Commit:8368568
Author:Nel Nibcord
Committer:Nel Nibcord

WIP: Training works on mlx Still debugging some tinygrad stuff, and fixing comms

Commit:5534424
Author:Alex Cheema

use double for flops protobuf

Commit:b8c4b46
Author:Alex Cheema

prioritise network interfaces and display in the tui

Commit:ca0caad
Author:Pranav Veldurthi

Image to image generation

Commit:6b28ef0
Author:Pranav Veldurthi

Stable stable diffusion mlx

Commit:8b71d57
Author:Nel Nibcord

Removed inference state entirely For tinygrad, this means caching start_pos in StatefulModel

Commit:82cce44
Author:Nel Nibcord
Committer:Nel Nibcord

Some initial inference engine refactors for enabling training Only on MLX for now, breaks Tinygrad (doesn't fulfill interface yet)

Commit:de2f6d2
Author:Alex Cheema

implement a health check for peers and discovery should only return healthy peers

Commit:acc94b5
Author:Varshith

chatgpt api integration

Commit:2084784
Author:Alex Cheema

per-request kv cache, remove all explicit reset functionality as it wasnt used. fixes #67

Commit:1475c73
Author:Alex Cheema

fix inference_state serialization. related: #40 #44 #45

Commit:4b592f9
Author:Alex Cheema

exo topology visualisation that shows the topology of the network, device capabilities and the currently active node using opaque statuses. fixes #36. ready for #33

Commit:54c9860
Author:Alex Cheema

more robust grpc discovery with asyncio and proper error handling, add flops to device capabilities. fixes #23 and progress on #33

Commit:8a35fd8
Author:Alex Cheema

support chatgpt api endpoint fron any node #24

Commit:03ba31c
Author:Alex Cheema
Committer:Alex Cheema

wip state

Commit:e17905e
Author:Alex Cheema

add global reset

Commit:d2184f5
Author:Alex Cheema

keep track of already visited peers in global operations: collect_topology

Commit:5bbde22
Author:Alex Cheema

move everything under exo module

Commit:b01f69b
Author:Alex Cheema

add support for multiple concurrent requests with request ids

Commit:36b8456
Author:Alex Cheema
Committer:Alex Cheema

collect global topology with local peer visibility, ring memory weighted partitioning strategy

Commit:6c8c9ee
Author:Alex Cheema

topology with partitioning strategy

Commit:563dcb5
Author:Alex Cheema

mlx sharded implementation with example of distributed inference

Commit:a21f59f
Author:Alex Cheema

scaffolding for networking, inference and orchestration