Proto commits in autonomi-ai/nos

These 19 commits are when the Protocol Buffers files have changed:

Commit:81047c4
Author:

Deployed 2761f7b with MkDocs version: 1.6.0

The documentation is generated from this commit.

Commit:397747c
Author:Sudeep Pillai
Committer:GitHub

Support async gRPC handling with scaled-replicas (#494) ## Summary This PR adds support for async gRPC handling with scaled replicas. Individual requests are dispatched to one of the many actor pools and the response is fetched by the client asynchronously. We also introduce ActorPools with task throttling to make sure we can scale and await on tasks using the async gRPC service. Other additions / fixes: - Support loading models with fixed number of replicas on init - Improved `ModelResource` handling for gpu/cpu memory resources - All `noop` methods are now CPU only models - Refactored grpc server tests to re-use serve fixture - Fixed HTTP client API implementation to use normalized model-ids ## Related issues <!-- For example: "Closes #1234" --> ## Checks - [ ] `make lint`: I've run `make lint` to lint the changes in this PR. - [ ] `make test`: I've made sure the tests (`make test-cpu` or `make test`) are passing. - Additional tests: - [ ] Benchmark tests (when contributing new models) - [ ] GPU/HW tests

The documentation is generated from this commit.

Commit:1de4539
Author:Sudeep Pillai
Committer:GitHub

Support for custom model resource limits and init args/kwargs (#492)

Commit:b4acd2a
Author:Sudeep Pillai
Committer:GitHub

Support model streaming responses with new `Stream` gRPC service (#469) This PR adds a new `Stream` gRPC service that allows for streaming responses from the server. This is useful for models that produce a stream of outputs such as text, video frames or audio chunks. The `Stream` service is implemented as a uni-directional output stream with a streaming response. New models: - `llama2_chat` - a simple chatbot model that uses the `transformers` `AutoModelForCausalLM` with a `chat` submethod. New models include: `meta-llama/Llama-2-7b-chat-hf`, `HuggingFaceH4/zephyr-7b-beta`, `HuggingFaceH4/tiny-random-LlamaForCausalLM` (for testing purposes). Other changes: - refactored names for all exceptions without `Nos` prefix <!-- Thank you for your contribution! Please review https://github.com/autonomi-ai/nos/blob/main/docs/CONTRIBUTING.md before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Summary <!-- Please give a short summary of the change and the problem this solves. --> ## Related issues <!-- For example: "Closes #1234" --> ## Checks - [ ] `make lint`: I've run `make lint` to lint the changes in this PR. - [ ] `make test`: I've made sure the tests (`make test-cpu` or `make test`) are passing. - Additional tests: - [ ] Benchmark tests (when contributing new models) - [ ] GPU/HW tests

Commit:0666ee8
Author:Sudeep Pillai
Committer:GitHub

Improved whisper transcription implementation with increased batch size (#430) - Support file uploads with remote path inference - removed `transcribe_file_blob` method since new `transcribe` method is able to handle small/large files easily <!-- Thank you for your contribution! Please review https://github.com/autonomi-ai/nos/blob/main/docs/CONTRIBUTING.md before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Summary <!-- Please give a short summary of the change and the problem this solves. --> ## Related issues <!-- For example: "Closes #1234" --> ## Checks - [x] `make lint`: I've run `make lint` to lint the changes in this PR. - [ ] `make test`: I've made sure the tests (`make test-cpu` or `make test`) are passing. - Additional tests: - [ ] Benchmark tests (when contributing new models) - [ ] GPU/HW tests

Commit:08c2e9c
Author:Sudeep Pillai
Committer:GitHub

SkyPilot integration docs with running example (#422) ## Summary Documentation for starting nos server on GCP instance with SkyPilot. - pending demo showcasing how developers can incrementally update their implementations with client + remote-server. ## Related issues Resolves #406 ## Checks - [x] `make lint`: I've run `make lint` to lint the changes in this PR. - [x] `make test`: I've made sure the tests (`make test-cpu` or `make test`) are passing. - Additional tests: - [ ] Benchmark tests (when contributing new models) - [ ] GPU/HW tests

Commit:3d8b614
Author:Sudeep Pillai
Committer:GitHub

Refactor gRPC and ModelHandles to support multiple method handles (#412) ## Summary This PR makes a significant update to the core API that simplifies the need to specify TaskType's in every `Run()` call and also supports multiple methods for any wrapped Model. **Key Features:** - Updated `Run()` to remove the need to specify `TaskType`s ```python # Old `Run()` implementation client.Run(task=TaskType.IMAGE_EMBEDDING, model_name="openai/clip") # New `Run()` implementation (wraps `__call__`) client.Run("yolox/yolox-medium", inputs={...}) # New `Run()` implementation with custom methods (wraps `encode_image`) client.Run("openai/clip", inputs={...}, method="encode_image") ``` - Patch `ModelHandle` with additional methods that may be implemented (`forward`, `encode_text`, `encode_image` ). This allows us remotely execute model methods with a familiar API. In addition, we support `ModelHandlePartial` objects to also allow for async submissions of specific methods. submisions for a specific method. ```python spec: ModelSpec = hub.load_spec("openai/clip") handle: ModelHandle = manager.load(spec) # New methods automatically patched img_embeddings = handle.encode_image(...) txt_embeddings = handle.encode_image(...) # ModelHandlePartials allow us to make async `submit()` calls via: # Async submission of image encoding handle.encode_image.submit(...) ``` - `ModelHandlePartial` now enables the following - `handle.encode_image` is actually a `ModelHandlePartial` that wraps `handle.__call__(..., _method="encode_image")` through it's `__call__` method. - `handle.encode_image.submit` wraps the `handle.submit(..., _method="encode_image")` method. - `Module` interface now supports multiple methods that are dynamically patched that is automatically inspected. - `FunctionSignature` now supports multiple signature methods and metadata (tasks, model resources now are defined here). **Tests** - added checks in `test_hub.py` to test for model info and metadata Fully functional ModelHandlePartial impl with multiple method signatures - adds tests for `ModelHandle` and `ModelHandlePartial`s to call custom methods including `__call__` and `submit()` - Added tests for multiple signatures and default signatures/methods. - Added / updated and working tests for `hub`, `common`, `executor` and `manager` ## Related issues Resolves #366 #368 #369 #370 #362 ## Checks - [x] `make lint`: I've run `make lint` to lint the changes in this PR. - [x] `make test`: I've made sure the tests (`make test-cpu` or `make test`) are passing. - Additional tests: - [ ] Benchmark tests (when contributing new models) - [ ] GPU/HW tests

Commit:74bf39f
Author:Sudeep Pillai
Committer:GitHub

Discord bot with NOS fine-tuning API (#314) ## Summary - adds discord bot with Dockerfile, docker-compose.yml and requirements for discord app - new discord training bot with new fine-tuning API for sdv2 lora - added tests for fine-tuning API <!-- Thank you for your contribution! Please review https://github.com/autonomi-ai/nos/blob/main/docs/CONTRIBUTING.md before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> <!-- Please give a short summary of the change and the problem this solves. --> ## Related issues <!-- For example: "Closes #1234" --> ## Checks - [x] `make lint`: I've run `make lint` to lint the changes in this PR. - [x] `make test`: I've made sure the tests (`make test-cpu` or `make test`) are passing. - Additional tests: - [ ] Benchmark tests (when contributing new models) - [ ] GPU/HW tests

Commit:5007e83
Author:Sudeep Pillai
Committer:GitHub

SDv2 Dreambooth LoRA fine-tuning API (#312) ## Summary - support for LoRA based fine-tuning of stable-diffusion via dreambooth - added LoRA dreambooth based inference with attn_procs swapping - added test training service <!-- Please give a short summary of the change and the problem this solves. --> ## Related issues <!-- For example: "Closes #1234" --> ## Checks - [x] `make lint`: I've run `make lint` to lint the changes in this PR. - [x] `make test`: I've made sure the tests (`make test-cpu` or `make test`) are passing. - Additional tests: - [ ] Benchmark tests (when contributing new models) - [ ] GPU/HW tests

Commit:0f9a1f8
Author:Sudeep Pillai
Committer:Sudeep Pillai

Shared memory transport with docker-host permissions fixes

Commit:b096677
Author:Sudeep Pillai
Committer:Sudeep Pillai

Adding inference benhcmark notebook

Commit:18105df
Author:Sudeep Pillai
Committer:Sudeep Pillai

Shared memory transport for gRPC This PR adds a new transport for gRPC that uses shared memory for communication between client and server. This transport is intended to be used for inference workloads where the client and server are on the same machine. The transport is implemented using the shared memory primitives in the multiprocessing module. - Adds new `shm.py` module in `nos/common` with shared memory primitives - Adds tests for the new transport with new `NOS_SHM_ENABLED` env var - Updates to volume mounts for all docker-compose files - Updated Dockerfile with `NOS_SHM_ENABLED=1` enabled by default - Updates to `nos_service.proto` to register/unregister `SHM` transport

Commit:98f38c4
Author:Sudeep Pillai
Committer:Sudeep Pillai

Refactored prediction API with `ModelSpec` and `TensorSpec` signatures Common: - Refactor `ModelSpec` to `nos.common` for use in the client-side model handle inspection. - Introduces `TensorSpec`, `ImageSpec` and `EmbeddingSpec` for hinting image and tensor shapes/dtypes for various prediction inputs and outputs. - Added several shape validations to `TensorSpec`, `ImageSpec` and `EmbeddingSpec` (including tests). - Move to new `TaskSpec` definition with various tasks specified including `object_detection_2d`, `image_classification`, `image_segmentation_2d`, `image_generation`, `text_embedding`, `image_embedding`. Proto (`nos_service.proto`): - Added various struct definitions for `GetModelInfo()`: `ModelInfo`, `ModelListRequest`, `ModelListResponse` Server-side: - Refactored server to use new `ModelSpec` and `FunctionSignature` with a cleaner interface for future custom model-registry (see `test_common_spec.py`) Client-side: - Adds `GetModelInfo()` to `nos.client` for retrieving model-specific function signatures. - New `InferenceModule` to simplify model inference: ```python >>> client = InferenceClient() >>> model = client.Module(task=TaskType.IMAGE_EMBEDDING, model_name="openai/clip-vit-base-patch32") >>> predictions = model(**inputs) ```

Commit:d1dd301
Author:Sudeep Pillai
Committer:Sudeep Pillai

Client-side `WaitForServer()` and `IsHealthy()` Implements wait for server and is-healthy status checks for clients to simplify setting up and waiting for the initialized backend server. - Added `GetServiceInfo()` for server version info

Commit:de4d6b2
Author:Sudeep Pillai
Committer:Sudeep Pillai

Updates to README, ROADMAP and fixes to gRPC e2e tests - added ROADMAP to docs - added nos service stubs for `GetModelInfo()` - removed unnecessary imports in `nos/client/grpc.py` - refactored `nos/test/conftest.py` to fix `e2e` grpc errors

Commit:5c9ae21
Author:Sudeep Pillai
Committer:Sudeep Pillai

Refactor `nos/server` and `nos/client` modules for better extensibility. - fixed gRPC protocol errors for python3.8 and python3.7 interop - `nos/server`: - `docker`: Implements the `DockerRuntime` runtime daemon - `runtime`: Implements the `InferenceServiceRuntime` - `service`: Implements the gRPC `InferenceService` server - `nos/client` is now a package with submodules `grpc` and `http`.

Commit:e3b1240
Author:Sudeep Pillai
Committer:Sudeep Pillai

Dockerized inference runtime with model-multiplexing #71 - `InferenceService`: moved to synchronous gRPC - refactored `server/runtime.py` and `server/service.py`: `server/runtime.py` is the dockerized runtime for the inference service, and the `server/service.py` implements the gRPC inference service logic. - added `pytest.mark.benchmark` and `pytest.mark.e2e` markers for benchmarking and end-to-end tests - updated nos service proto with num_replicas (removed min_replicas, max_replicas) - fixed default gRPC payload size limits (client/server-side) Other fixes/improvements: - Improved docker CLI UX with start/stop/status - removed benchmark CLI arguments and `deploy`: dyanmically load models instead - `nos docker start --gpu` does a few things: - Spins up a docker container with the specified image (`autonomi-ai/nos:latest-gpu or latest-cpu`). - Runs `nos-grpc-server` as the command/entrypoint for the container so that the inference server is fully-containerized. - `InferenceRuntime` class in `nos/experimental/grpc/client.py` uses the singleton DockerExecutor to start containers in daemon mode if they are not already running.

Commit:55c0a83
Author:Sudeep Pillai
Committer:Sudeep Pillai

Experimental gRPC-based inference server - gRPC inference server and client - new proto definitions for gRPC service - custom protoc compiler to dynamically generate pb2 and pb2_grpc files - new grpc-serve subcommand to serve/predict a model via gRPC - updated Dockerfile and docker-compose with nos install

Commit:df518ea
Author:Sudeep Pillai
Committer:Sudeep Pillai

Improved gRPC server-side inferencing with client-side contextlibmanager