These 68 commits are when the Protocol Buffers files have changed:
| Commit: | abf9f39 | |
|---|---|---|
| Author: | Magdy Saleh | |
| Committer: | GitHub | |
Fix stella model and client <> embedding compatibility (#717)
The documentation is generated from this commit.
| Commit: | 2af302d | |
|---|---|---|
| Author: | Ajinkya Tejankar | |
| Committer: | GitHub | |
Fix decoder_input_details bug (#705)
| Commit: | 0204c2e | |
|---|---|---|
| Author: | Travis Addair | |
| Committer: | GitHub | |
Record number of skipped tokens in the response (#681)
| Commit: | 864e8c4 | |
|---|---|---|
| Author: | Travis Addair | |
WIP: better n
| Commit: | ebd8d0d | |
|---|---|---|
| Author: | Travis Addair | |
| Committer: | GitHub | |
Fix `frequency_penalty` and `presence_penalty` (#672)
| Commit: | 6c5ca67 | |
|---|---|---|
| Author: | Travis Addair | |
| Committer: | GitHub | |
Chunked prefill (#653)
| Commit: | d1ad0fb | |
|---|---|---|
| Author: | Magdy Saleh | |
| Committer: | GitHub | |
Revert "Speed up NER inference (#598)" This reverts commit 19301c744693f4156cca72914c94a389716f2adf.
| Commit: | 19301c7 | |
|---|---|---|
| Author: | Magdy Saleh | |
| Committer: | GitHub | |
Speed up NER inference (#598)
| Commit: | 3a74c25 | |
|---|---|---|
| Author: | Magdy Saleh | |
| Committer: | GitHub | |
add new agnostic health endpoint (#588) Co-authored-by: Travis Addair <tgaddair@gmail.com>
| Commit: | 20c544f | |
|---|---|---|
| Author: | Travis Addair | |
| Committer: | GitHub | |
Add Llava Next (VLM) (#586)
| Commit: | 1d2b514 | |
|---|---|---|
| Author: | Travis Addair | |
| Committer: | GitHub | |
Add prefix caching (#581)
| Commit: | efa4bff | |
|---|---|---|
| Author: | Magdy Saleh | |
| Committer: | GitHub | |
Move NER output formatting to server (#561)
| Commit: | 07addea | |
|---|---|---|
| Author: | Travis Addair | |
| Committer: | GitHub | |
Fix: short circuit download, load, offload for preloaded adapters (#552)
| Commit: | 59631a0 | |
|---|---|---|
| Author: | Travis Addair | |
| Committer: | GitHub | |
Apply chat template in router to properly validate input length (#538)
| Commit: | 452ac73 | |
|---|---|---|
| Author: | Travis Addair | |
| Committer: | GitHub | |
Tokenize inputs in router (#548)
| Commit: | 5a7a1be | |
|---|---|---|
| Author: | Travis Addair | |
| Committer: | GitHub | |
Move kv cache allocation to router to ensure correct block allocation (#545)
| Commit: | a3ad209 | |
|---|---|---|
| Author: | Magdy Saleh | |
| Committer: | GitHub | |
Lorax NER (#531)
| Commit: | e8f3d33 | |
|---|---|---|
| Author: | Travis Addair | |
| Committer: | GitHub | |
Add support for batching to embedder models (#503) Co-authored-by: Magdy Saleh <magdy@predibase.com>
| Commit: | e37549e | |
|---|---|---|
| Author: | Magdy Saleh | |
| Committer: | GitHub | |
Embedder Service v0 with FlashBert (#385)
| Commit: | b24dc3e | |
|---|---|---|
| Author: | Magdy Saleh | |
some error handling
| Commit: | b79841a | |
|---|---|---|
| Author: | Magdy Saleh | |
merge
| Commit: | f66953e | |
|---|---|---|
| Author: | Magdy Saleh | |
v0 of bert inner works
| Commit: | 5581ee8 | |
|---|---|---|
| Author: | Travis Addair | |
| Committer: | GitHub | |
Added support for Medusa speculative decoding adapters (#372)
| Commit: | 56ee215 | |
|---|---|---|
| Author: | Magdy Saleh | |
test
| Commit: | 2d9a270 | |
|---|---|---|
| Author: | Jonas Schroeder | |
| Committer: | GitHub | |
Add support for returning alternative tokens (#297)
| Commit: | 6dea404 | |
|---|---|---|
| Author: | Travis Addair | |
| Committer: | GitHub | |
Enforce adapters cannot be loaded past `--adapter-memory-fraction` (#306)
| Commit: | e51f078 | |
|---|---|---|
| Author: | Travis Addair | |
| Committer: | GitHub | |
Generate to `max_total_tokens` during warmup (#286)
| Commit: | 309618c | |
|---|---|---|
| Author: | Travis Addair | |
| Committer: | GitHub | |
Added Outlines logits processor for JSON schema validation (#224) Co-authored-by: Jeffrey Tang <jeff@predibase.com>
| Commit: | 3b4c973 | |
|---|---|---|
| Author: | Travis Addair | |
| Committer: | GitHub | |
Merge multiple LoRA adapters per request (linear, TIES, DARE) (#212)
| Commit: | 440a6ef | |
|---|---|---|
| Author: | Magdy Saleh | |
some progress
| Commit: | a90d443 | |
|---|---|---|
| Author: | Travis Addair | |
| Committer: | GitHub | |
OpenAI v1 Chat Completions API (#171)
| Commit: | d88ffed | |
|---|---|---|
| Author: | Travis Addair | |
| Committer: | GitHub | |
Added `prompt_tokens` to the response (#165)
| Commit: | 143f303 | |
|---|---|---|
| Author: | Magdy Saleh | |
| Committer: | GitHub | |
Add predibase as a source for adapters (#125)
| Commit: | b8a4032 | |
|---|---|---|
| Author: | Infernaught | |
| Committer: | GitHub | |
Rename tgi and text-generation to lorax in rust (#19)
| Commit: | 39c1918 | |
|---|---|---|
| Author: | Infernaught | |
Change TextGenerationService to LoraxService
| Commit: | c7cfbd9 | |
|---|---|---|
| Author: | Travis Addair | |
| Committer: | GitHub | |
Implement continuous multi-adapter batching (#2)
| Commit: | 3b9fdfd | |
|---|---|---|
| Author: | Geoffrey Angus | |
| Committer: | Travis Addair | |
Enable Mistral (#13) * get flash attn v2 working with llama-2 * first working mistral e2e * warnings about adapter loading * add window size to router * DAL working * pr revision
| Commit: | ef82137 | |
|---|---|---|
| Author: | Geoffrey Angus | |
| Committer: | Travis Addair | |
make dynamic adapter loading compatible with s3 change; todo: verify static adapter loading
| Commit: | e05e0ed | |
|---|---|---|
| Author: | Geoffrey Angus | |
| Committer: | Travis Addair | |
add error handling for bad download_weight calls
| Commit: | 9d3e63f | |
|---|---|---|
| Author: | Geoffrey Angus | |
| Committer: | Travis Addair | |
wip: adds adapter hot-loading in Python server; TODO: add queue changeover logic
| Commit: | fe80f53 | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
feat(server): auto max_batch_total_tokens for flash att models (#630)
| Commit: | e74bd41 | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
feat(server): add paged attention to flash models (#516) Closes #478
| Commit: | 895c5f1 | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
feat(server): only compute prefill logprobs when asked (#406) Close #288
| Commit: | 218c9ad | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
feat: decrease IPC proto size (#367) Closes #307 #308
| Commit: | db2b4e0 | |
|---|---|---|
| Author: | Nicolas Patry | |
| Committer: | GitHub | |
feat(router): new healthcheck that skips the queue (#244) Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com> Co-authored-by: OlivierDehaene <olivier@huggingface.co>
| Commit: | ebc74d5 | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
feat(router): use number of tokens in batch as input for dynamic batching (#226) Co-authored-by: Nick Hill <nickhill@us.ibm.com>
| Commit: | 343437c | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
feat(router): add device and dtype info (#215)
| Commit: | 9987960 | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
feat(router): make router input validation optional (#164)
| Commit: | 610bb1f | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
feat(benchmark): tui based benchmarking tool (#149)
| Commit: | f000068 | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
feat(server): clear cache on error (#143)
| Commit: | b49dbf2 | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
fix(server): use server tokenizer as gt (#128)
| Commit: | 1a2d682 | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
feat: support typical sampling (#114) closes #112
| Commit: | 9b8ea6a | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
feat(server): add logits watermark (#90)
| Commit: | 0ac184c | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
feat(server): add special token bool (#85)
| Commit: | 20c3c59 | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
feat(router): refactor API and add openAPI schemas (#53)
| Commit: | 313194f | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
feat(server): support repetition penalty (#47)
| Commit: | 017a2a8 | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
feat: Add token streaming using ServerSideEvents support (#41)
| Commit: | 54fec93 | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
fix(server): fix seeding with multiple shards (#44)
| Commit: | 4f9ac67 | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
Revert "feat: Add token streaming using ServerSideEvents support" (#40) Reverts huggingface/text-generation-inference#36
| Commit: | 7fbfbb0 | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
feat: Add token streaming using ServerSideEvents support (#36) Add token streaming using ServerSideEvents (SSE). The signature of the SSE events is: ```rust struct Details { finish_reason: String, generated_tokens: u32, seed: Option<u64>, } struct StreamResponse { token: Token, generated_text: Option<String>, details: Option<Details>, } struct ErrorResponse { error: String, } ```
| Commit: | cd298bc | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
feat: Support sampling seeding (#37) Co-authored-by: Yannic Kilcher <yk@users.noreply.github.com>
| Commit: | 32a2530 | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
feat: Return logprobs (#8)
| Commit: | 718096f | |
|---|---|---|
| Author: | OlivierDehaene | |
| Committer: | GitHub | |
feat: Support stop sequences (#7)
| Commit: | 427d7cc | |
|---|---|---|
| Author: | OlivierDehaene | |
feat(server): Support AutoModelForSeq2SeqLM
| Commit: | c5665f5 | |
|---|---|---|
| Author: | OlivierDehaene | |
feat(server): Support generic AutoModelForCausalLM
| Commit: | f16f2f5 | |
|---|---|---|
| Author: | Olivier Dehaene | |
| Committer: | OlivierDehaene | |
v0.1.0
| Commit: | 4c693e6 | |
|---|---|---|
| Author: | Olivier Dehaene | |
Refactored gRPC interface Added validation logic
| Commit: | 295831a | |
|---|---|---|
| Author: | Olivier Dehaene | |
Init