These 34 commits are when the Protocol Buffers files have changed:
Commit: | 508a947 | |
---|---|---|
Author: | Lihao Ran | |
Committer: | Lihao Ran |
Allow users to specify whether or not to add bos token
The documentation is generated from this commit.
Commit: | 572ff6d | |
---|---|---|
Author: | Lihao Ran | |
Committer: | Lihao Ran |
Allow users to specify whether or not to add bos token
The documentation is generated from this commit.
Commit: | 082c0ac | |
---|---|---|
Author: | Aman Gupta | |
Committer: | GitHub |
Supporting Multi-LoRA inferencing via JetStream server (#221) Supporting Multi-LoRA inferencing via JetStream server following [LLM Inference gateway API protocols](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/003-model-server-protocol#inference-api-protocol). - Implemented an adapter_tensorstore to load, store, manage and unload the adapter weights - Added and exposed [required metrics](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/003-model-server-protocol#metrics-reporting) at prometheus endpoint - Added multi_lora_decoding service with corresponding APIs as per the [requirement](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/003-model-server-protocol#inference-api-protocol). - Implemented single LoRA functionality support.
Commit: | 9d19631 | |
---|---|---|
Author: | gpolovets1 | |
Committer: | GitHub |
Added new HuggingFaceTokenizer to token_utils and updated TokenizerParameters to include tokenizer_type and access_token as additional metadata to store. (#229)
Commit: | 55b6604 | |
---|---|---|
Author: | George | |
Committer: | George |
Added new HuggingFaceTokenizer to token_utils and updaetd TokenizerParameters to include tokenizer_type and access_token as additional metadata to store.
Commit: | 5f679a9 | |
---|---|---|
Author: | Aman Gupta |
- Created separate adapter_tensorstore for each engine. - Implemented unapply lora from base_params - Fixed some comments from the PR
Commit: | 26b1f37 | |
---|---|---|
Author: | Aman Gupta |
Merging main to amangu-lora.
Commit: | eb74d86 | |
---|---|---|
Author: | Aman Gupta |
Refactoring part-2.
Commit: | e4d875a | |
---|---|---|
Author: | Aman Gupta |
Refactoring and cleaning of the JetStream server code.
Commit: | 1d6b456 | |
---|---|---|
Author: | Yijia | |
Committer: | GitHub |
Revert accidental change - back to #216 This reverts commit 00dc5a61fcad66846cb45e449d5078c6424c970e, reversing changes made to 951b3ef8d329e419289af1df8cdada6983073605. Co-authored-by: Yijia J <yijiaj@google.com>
Commit: | 80bfefc | |
---|---|---|
Author: | Lumosis | |
Committer: | GitHub |
Add multi-sampling functionality (#215)
Commit: | 0d464da | |
---|---|---|
Author: | Lihao Ran | |
Committer: | Lihao Ran |
test multi-sampling
Commit: | 3c6fcbd | |
---|---|---|
Author: | aman2930 |
1) Implemented a new Service API proto to align with OpenAI completion API (https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/docs/proposals/003-model-server-protocol/README.md#inference-api-protocol), & . 2) Added a flag to explicitly run the JetStream server with these APIs when . Else only expose older Decode() & HealthCheck() APIs of the JetStream Server. 3) Fixed a bug in the adapter_tensorstore while converting jnp_array and np_array. 4) Added a which made requests to the new APIs (v1/load_lora_adapter, v1/unload_lora_adapter, v1/models, v1/completions)
Commit: | fb88eca | |
---|---|---|
Author: | aman2930 |
1) Implemented adapter_tensorstore module to store and manage the adapters. Its functionality includes loading, unloading of adapters between CPU RAM and HBM. It also follows LRU policy to evict the adapter if a new load_adapter request comes up. Currently it is only storing the adapter as separate tensors (lora_a and lora_b). Calculation of lora_b x lora_a is being done in prefill() and generate() during decode request. Adapter_tensorstore can be configured with a max_limit on HBM and RAM. 2) Functionality to load from a catalog file at the start of the server is added. If no file is given, it will just load the base params. Loading from the catalog file is done on CPU RAM. After that based on incoming requests, those params are moved/evicted to/from HBM. 3) Some proto updates to get only single path for each adapter, and that path is expected to have an adapter_config.json and Orbax format weights in 0/items folder.
Commit: | ef073f8 | |
---|---|---|
Author: | jetstream authors | |
Committer: | Vipan Nalla |
fixing decode. PiperOrigin-RevId: 720397779
Commit: | cfb987b | |
---|---|---|
Author: | wyzhang | |
Committer: | Vipan Nalla |
Revert past 2 commits which accidentally deletes the code due to copybara issue (#167) * Revert "Reverts 6a3579056f307fed3428102df5823a7ff7cebdc6" This reverts commit b459cc1f297a8564e9c6f14346ad5ef41e2d68c6. * Revert "fixing decode." This reverts commit 6a3579056f307fed3428102df5823a7ff7cebdc6.
Commit: | 91ab2e1 | |
---|---|---|
Author: | jetstream authors | |
Committer: | Vipan Nalla |
internal change PiperOrigin-RevId: 720730187
Commit: | 405a3d5 | |
---|---|---|
Author: | Yijia | |
Committer: | Vipan Nalla |
Revert "internal change" (#169) This reverts commit 4c7838ac69db15a17f540406787c6b4dbc692b03.
Commit: | a49c0a4 | |
---|---|---|
Author: | Yijia | |
Committer: | GitHub |
Revert "internal change" (#169) This reverts commit 4c7838ac69db15a17f540406787c6b4dbc692b03.
Commit: | 4c7838a | |
---|---|---|
Author: | jetstream authors | |
Committer: | jetstream authors |
internal change PiperOrigin-RevId: 720730187
Commit: | e8439b7 | |
---|---|---|
Author: | wyzhang | |
Committer: | GitHub |
Revert past 2 commits which accidentally deletes the code due to copybara issue (#167) * Revert "Reverts 6a3579056f307fed3428102df5823a7ff7cebdc6" This reverts commit b459cc1f297a8564e9c6f14346ad5ef41e2d68c6. * Revert "fixing decode." This reverts commit 6a3579056f307fed3428102df5823a7ff7cebdc6.
Commit: | 6a35790 | |
---|---|---|
Author: | jetstream authors | |
Committer: | jetstream authors |
fixing decode. PiperOrigin-RevId: 720397779
Commit: | 7426ea7 | |
---|---|---|
Author: | aman2930 |
1) Added MultiAdapterManager service proto along with the methods ListAdapters, LoadAdapter and UnloadAdapter. 2) Driver which is holding list of all loaded base-parameters is now storing the list of lora updated paramters for loaded lora. Implemented methods for loading, unloading and listing LoRA adapters into the Driver object. Original base model params are intact and saved into the params dictionary with key . 3) Created a proxy-client to make MultiAdapterManager service requests to JetStream server.
Commit: | 973647d | |
---|---|---|
Author: | Yijia | |
Committer: | GitHub |
Revert "Internal refactor" (#156) This reverts commit 8e18e7fd1db4ee271fa677eec86b0b90a3822c95. Co-authored-by: Yijia J <yijiaj@google.com>
Commit: | 8e18e7f | |
---|---|---|
Author: | jetstream authors | |
Committer: | Yijia J |
Internal refactor PiperOrigin-RevId: 706772024
Commit: | d681995 | |
---|---|---|
Author: | Brendan Slabe | |
Committer: | GitHub |
Various request time metrics (#121) * first commit * nit * fmt * description tweak * added more metrics * nit * nit * default metadata values * move `new_request.metadata.transfer_start_time = time.perf_counter()` * avoid NoneType * NoneType * set transfer_end_time and fmt * camel case -> snake case * description update * change descriptions * fmt * logs * better logs * changed timings * observing queue duration metric * buckets in sorted order * buckets not in sorted order * corrected times * number of output tokens * move prefill_start_time, enable debug, maybe correct len for num tokens in detokenize * fmt * correct lengths of output tokens based on debug * debug transfer queue time * remove log * removed logs, almost final * nits * readd log * change logs * reomve log * condence * improve test coverage * revert _abort_or_raise deletion * start_time mandatory * undo * nit * updated buckets * added 'jetstream_time_per_request' * nit * add 'jetstream_wait_time_per_request' * nit * missing .metadata * lint * change order of params * changed metric description * Add metadata field to proto * update proto * tweak generated file * tweak generated file * update proto * pylint * generate protos * change start time assignment * .value * CopyFrom * change definition of queue duration metric * Increase test coverage * fixed assertions * fmt * incorrect prefill time * Add license statements * Protobuf Python Version * fmt * pylint
Commit: | 3946afa | |
---|---|---|
Author: | Brendan Slabe | |
Committer: | GitHub |
Makefile (#125) * first commit * changed unit_tests.yaml * generate-protos * better generate-protos logic * append -> prepend * more make targets
Commit: | 46c152f | |
---|---|---|
Author: | Zijun Zhou | |
Committer: | GitHub |
Cleanup orchestrator proto (#112) * Cleanup orchestrator proto * Update JetStream based on proto cleanup
Commit: | 0c56aac | |
---|---|---|
Author: | vivianrwu | |
Committer: | GitHub |
Add healthcheck support for JetStream (#90) * Add healthcheck support for JetStream * fix indentation * fix pylint unit test * use pyink to reformat generated protos
Commit: | 01c5a03 | |
---|---|---|
Author: | Zijun Zhou | |
Committer: | GitHub |
Update JetStream grpc proto to support I/O with text and token ids (#78) * Update JetStream grpc proto to support I/O with text and token ids * Update orchestrator and token utils to support text and token I/O * Add and update unit tests * Fix prometheus duplicate metrics issue * add shortuuid dep * Update docstring * Add client tokenization mode * Update client side I/O handling * latest pylint fix
Commit: | 0dbb2a5 | |
---|---|---|
Author: | Zijun Zhou | |
Committer: | Junwei Yang |
Align Tokenizer in JetStream (#40) * Align Tokenizer in JetStream * Update requirements with pytest dep * Remove mix_decode unit test
Commit: | a0df320 | |
---|---|---|
Author: | Zijun Zhou | |
Committer: | GitHub |
Align Tokenizer in JetStream (#40) * Align Tokenizer in JetStream * Update requirements with pytest dep * Remove mix_decode unit test
Commit: | 90b2a9d | |
---|---|---|
Author: | Zijun Zhou | |
Committer: | GitHub |
Support JetStream MaxText user guide (#28)
Commit: | 6f55565 | |
---|---|---|
Author: | Zijun Zhou | |
Committer: | Zijun Zhou |
JetStream init version Co-authored-by: Sholto Douglas <sholto@google.com> Co-authored-by: Zijun Zhou <zijunzhou@google.com>