These 19 commits are when the Protocol Buffers files have changed:
Commit: | 81047c4 | |
---|---|---|
Author: |
Deployed 2761f7b with MkDocs version: 1.6.0
The documentation is generated from this commit.
Commit: | 397747c | |
---|---|---|
Author: | Sudeep Pillai | |
Committer: | GitHub |
Support async gRPC handling with scaled-replicas (#494) ## Summary This PR adds support for async gRPC handling with scaled replicas. Individual requests are dispatched to one of the many actor pools and the response is fetched by the client asynchronously. We also introduce ActorPools with task throttling to make sure we can scale and await on tasks using the async gRPC service. Other additions / fixes: - Support loading models with fixed number of replicas on init - Improved `ModelResource` handling for gpu/cpu memory resources - All `noop` methods are now CPU only models - Refactored grpc server tests to re-use serve fixture - Fixed HTTP client API implementation to use normalized model-ids ## Related issues <!-- For example: "Closes #1234" --> ## Checks - [ ] `make lint`: I've run `make lint` to lint the changes in this PR. - [ ] `make test`: I've made sure the tests (`make test-cpu` or `make test`) are passing. - Additional tests: - [ ] Benchmark tests (when contributing new models) - [ ] GPU/HW tests
The documentation is generated from this commit.
Commit: | 1de4539 | |
---|---|---|
Author: | Sudeep Pillai | |
Committer: | GitHub |
Support for custom model resource limits and init args/kwargs (#492)
Commit: | b4acd2a | |
---|---|---|
Author: | Sudeep Pillai | |
Committer: | GitHub |
Support model streaming responses with new `Stream` gRPC service (#469) This PR adds a new `Stream` gRPC service that allows for streaming responses from the server. This is useful for models that produce a stream of outputs such as text, video frames or audio chunks. The `Stream` service is implemented as a uni-directional output stream with a streaming response. New models: - `llama2_chat` - a simple chatbot model that uses the `transformers` `AutoModelForCausalLM` with a `chat` submethod. New models include: `meta-llama/Llama-2-7b-chat-hf`, `HuggingFaceH4/zephyr-7b-beta`, `HuggingFaceH4/tiny-random-LlamaForCausalLM` (for testing purposes). Other changes: - refactored names for all exceptions without `Nos` prefix <!-- Thank you for your contribution! Please review https://github.com/autonomi-ai/nos/blob/main/docs/CONTRIBUTING.md before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Summary <!-- Please give a short summary of the change and the problem this solves. --> ## Related issues <!-- For example: "Closes #1234" --> ## Checks - [ ] `make lint`: I've run `make lint` to lint the changes in this PR. - [ ] `make test`: I've made sure the tests (`make test-cpu` or `make test`) are passing. - Additional tests: - [ ] Benchmark tests (when contributing new models) - [ ] GPU/HW tests
Commit: | 0666ee8 | |
---|---|---|
Author: | Sudeep Pillai | |
Committer: | GitHub |
Improved whisper transcription implementation with increased batch size (#430) - Support file uploads with remote path inference - removed `transcribe_file_blob` method since new `transcribe` method is able to handle small/large files easily <!-- Thank you for your contribution! Please review https://github.com/autonomi-ai/nos/blob/main/docs/CONTRIBUTING.md before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Summary <!-- Please give a short summary of the change and the problem this solves. --> ## Related issues <!-- For example: "Closes #1234" --> ## Checks - [x] `make lint`: I've run `make lint` to lint the changes in this PR. - [ ] `make test`: I've made sure the tests (`make test-cpu` or `make test`) are passing. - Additional tests: - [ ] Benchmark tests (when contributing new models) - [ ] GPU/HW tests
Commit: | 08c2e9c | |
---|---|---|
Author: | Sudeep Pillai | |
Committer: | GitHub |
SkyPilot integration docs with running example (#422) ## Summary Documentation for starting nos server on GCP instance with SkyPilot. - pending demo showcasing how developers can incrementally update their implementations with client + remote-server. ## Related issues Resolves #406 ## Checks - [x] `make lint`: I've run `make lint` to lint the changes in this PR. - [x] `make test`: I've made sure the tests (`make test-cpu` or `make test`) are passing. - Additional tests: - [ ] Benchmark tests (when contributing new models) - [ ] GPU/HW tests
Commit: | 3d8b614 | |
---|---|---|
Author: | Sudeep Pillai | |
Committer: | GitHub |
Refactor gRPC and ModelHandles to support multiple method handles (#412) ## Summary This PR makes a significant update to the core API that simplifies the need to specify TaskType's in every `Run()` call and also supports multiple methods for any wrapped Model. **Key Features:** - Updated `Run()` to remove the need to specify `TaskType`s ```python # Old `Run()` implementation client.Run(task=TaskType.IMAGE_EMBEDDING, model_name="openai/clip") # New `Run()` implementation (wraps `__call__`) client.Run("yolox/yolox-medium", inputs={...}) # New `Run()` implementation with custom methods (wraps `encode_image`) client.Run("openai/clip", inputs={...}, method="encode_image") ``` - Patch `ModelHandle` with additional methods that may be implemented (`forward`, `encode_text`, `encode_image` ). This allows us remotely execute model methods with a familiar API. In addition, we support `ModelHandlePartial` objects to also allow for async submissions of specific methods. submisions for a specific method. ```python spec: ModelSpec = hub.load_spec("openai/clip") handle: ModelHandle = manager.load(spec) # New methods automatically patched img_embeddings = handle.encode_image(...) txt_embeddings = handle.encode_image(...) # ModelHandlePartials allow us to make async `submit()` calls via: # Async submission of image encoding handle.encode_image.submit(...) ``` - `ModelHandlePartial` now enables the following - `handle.encode_image` is actually a `ModelHandlePartial` that wraps `handle.__call__(..., _method="encode_image")` through it's `__call__` method. - `handle.encode_image.submit` wraps the `handle.submit(..., _method="encode_image")` method. - `Module` interface now supports multiple methods that are dynamically patched that is automatically inspected. - `FunctionSignature` now supports multiple signature methods and metadata (tasks, model resources now are defined here). **Tests** - added checks in `test_hub.py` to test for model info and metadata Fully functional ModelHandlePartial impl with multiple method signatures - adds tests for `ModelHandle` and `ModelHandlePartial`s to call custom methods including `__call__` and `submit()` - Added tests for multiple signatures and default signatures/methods. - Added / updated and working tests for `hub`, `common`, `executor` and `manager` ## Related issues Resolves #366 #368 #369 #370 #362 ## Checks - [x] `make lint`: I've run `make lint` to lint the changes in this PR. - [x] `make test`: I've made sure the tests (`make test-cpu` or `make test`) are passing. - Additional tests: - [ ] Benchmark tests (when contributing new models) - [ ] GPU/HW tests
Commit: | 74bf39f | |
---|---|---|
Author: | Sudeep Pillai | |
Committer: | GitHub |
Discord bot with NOS fine-tuning API (#314) ## Summary - adds discord bot with Dockerfile, docker-compose.yml and requirements for discord app - new discord training bot with new fine-tuning API for sdv2 lora - added tests for fine-tuning API <!-- Thank you for your contribution! Please review https://github.com/autonomi-ai/nos/blob/main/docs/CONTRIBUTING.md before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> <!-- Please give a short summary of the change and the problem this solves. --> ## Related issues <!-- For example: "Closes #1234" --> ## Checks - [x] `make lint`: I've run `make lint` to lint the changes in this PR. - [x] `make test`: I've made sure the tests (`make test-cpu` or `make test`) are passing. - Additional tests: - [ ] Benchmark tests (when contributing new models) - [ ] GPU/HW tests
Commit: | 5007e83 | |
---|---|---|
Author: | Sudeep Pillai | |
Committer: | GitHub |
SDv2 Dreambooth LoRA fine-tuning API (#312) ## Summary - support for LoRA based fine-tuning of stable-diffusion via dreambooth - added LoRA dreambooth based inference with attn_procs swapping - added test training service <!-- Please give a short summary of the change and the problem this solves. --> ## Related issues <!-- For example: "Closes #1234" --> ## Checks - [x] `make lint`: I've run `make lint` to lint the changes in this PR. - [x] `make test`: I've made sure the tests (`make test-cpu` or `make test`) are passing. - Additional tests: - [ ] Benchmark tests (when contributing new models) - [ ] GPU/HW tests
Commit: | 0f9a1f8 | |
---|---|---|
Author: | Sudeep Pillai | |
Committer: | Sudeep Pillai |
Shared memory transport with docker-host permissions fixes
Commit: | b096677 | |
---|---|---|
Author: | Sudeep Pillai | |
Committer: | Sudeep Pillai |
Adding inference benhcmark notebook
Commit: | 18105df | |
---|---|---|
Author: | Sudeep Pillai | |
Committer: | Sudeep Pillai |
Shared memory transport for gRPC This PR adds a new transport for gRPC that uses shared memory for communication between client and server. This transport is intended to be used for inference workloads where the client and server are on the same machine. The transport is implemented using the shared memory primitives in the multiprocessing module. - Adds new `shm.py` module in `nos/common` with shared memory primitives - Adds tests for the new transport with new `NOS_SHM_ENABLED` env var - Updates to volume mounts for all docker-compose files - Updated Dockerfile with `NOS_SHM_ENABLED=1` enabled by default - Updates to `nos_service.proto` to register/unregister `SHM` transport
Commit: | 98f38c4 | |
---|---|---|
Author: | Sudeep Pillai | |
Committer: | Sudeep Pillai |
Refactored prediction API with `ModelSpec` and `TensorSpec` signatures Common: - Refactor `ModelSpec` to `nos.common` for use in the client-side model handle inspection. - Introduces `TensorSpec`, `ImageSpec` and `EmbeddingSpec` for hinting image and tensor shapes/dtypes for various prediction inputs and outputs. - Added several shape validations to `TensorSpec`, `ImageSpec` and `EmbeddingSpec` (including tests). - Move to new `TaskSpec` definition with various tasks specified including `object_detection_2d`, `image_classification`, `image_segmentation_2d`, `image_generation`, `text_embedding`, `image_embedding`. Proto (`nos_service.proto`): - Added various struct definitions for `GetModelInfo()`: `ModelInfo`, `ModelListRequest`, `ModelListResponse` Server-side: - Refactored server to use new `ModelSpec` and `FunctionSignature` with a cleaner interface for future custom model-registry (see `test_common_spec.py`) Client-side: - Adds `GetModelInfo()` to `nos.client` for retrieving model-specific function signatures. - New `InferenceModule` to simplify model inference: ```python >>> client = InferenceClient() >>> model = client.Module(task=TaskType.IMAGE_EMBEDDING, model_name="openai/clip-vit-base-patch32") >>> predictions = model(**inputs) ```
Commit: | d1dd301 | |
---|---|---|
Author: | Sudeep Pillai | |
Committer: | Sudeep Pillai |
Client-side `WaitForServer()` and `IsHealthy()` Implements wait for server and is-healthy status checks for clients to simplify setting up and waiting for the initialized backend server. - Added `GetServiceInfo()` for server version info
Commit: | de4d6b2 | |
---|---|---|
Author: | Sudeep Pillai | |
Committer: | Sudeep Pillai |
Updates to README, ROADMAP and fixes to gRPC e2e tests - added ROADMAP to docs - added nos service stubs for `GetModelInfo()` - removed unnecessary imports in `nos/client/grpc.py` - refactored `nos/test/conftest.py` to fix `e2e` grpc errors
Commit: | 5c9ae21 | |
---|---|---|
Author: | Sudeep Pillai | |
Committer: | Sudeep Pillai |
Refactor `nos/server` and `nos/client` modules for better extensibility. - fixed gRPC protocol errors for python3.8 and python3.7 interop - `nos/server`: - `docker`: Implements the `DockerRuntime` runtime daemon - `runtime`: Implements the `InferenceServiceRuntime` - `service`: Implements the gRPC `InferenceService` server - `nos/client` is now a package with submodules `grpc` and `http`.
Commit: | e3b1240 | |
---|---|---|
Author: | Sudeep Pillai | |
Committer: | Sudeep Pillai |
Dockerized inference runtime with model-multiplexing #71 - `InferenceService`: moved to synchronous gRPC - refactored `server/runtime.py` and `server/service.py`: `server/runtime.py` is the dockerized runtime for the inference service, and the `server/service.py` implements the gRPC inference service logic. - added `pytest.mark.benchmark` and `pytest.mark.e2e` markers for benchmarking and end-to-end tests - updated nos service proto with num_replicas (removed min_replicas, max_replicas) - fixed default gRPC payload size limits (client/server-side) Other fixes/improvements: - Improved docker CLI UX with start/stop/status - removed benchmark CLI arguments and `deploy`: dyanmically load models instead - `nos docker start --gpu` does a few things: - Spins up a docker container with the specified image (`autonomi-ai/nos:latest-gpu or latest-cpu`). - Runs `nos-grpc-server` as the command/entrypoint for the container so that the inference server is fully-containerized. - `InferenceRuntime` class in `nos/experimental/grpc/client.py` uses the singleton DockerExecutor to start containers in daemon mode if they are not already running.
Commit: | 55c0a83 | |
---|---|---|
Author: | Sudeep Pillai | |
Committer: | Sudeep Pillai |
Experimental gRPC-based inference server - gRPC inference server and client - new proto definitions for gRPC service - custom protoc compiler to dynamically generate pb2 and pb2_grpc files - new grpc-serve subcommand to serve/predict a model via gRPC - updated Dockerfile and docker-compose with nos install
Commit: | df518ea | |
---|---|---|
Author: | Sudeep Pillai | |
Committer: | Sudeep Pillai |
Improved gRPC server-side inferencing with client-side contextlibmanager