These 68 commits are when the Protocol Buffers files have changed:
Commit: | abf9f39 | |
---|---|---|
Author: | Magdy Saleh | |
Committer: | GitHub |
Fix stella model and client <> embedding compatibility (#717)
The documentation is generated from this commit.
Commit: | 2af302d | |
---|---|---|
Author: | Ajinkya Tejankar | |
Committer: | GitHub |
Fix decoder_input_details bug (#705)
Commit: | 0204c2e | |
---|---|---|
Author: | Travis Addair | |
Committer: | GitHub |
Record number of skipped tokens in the response (#681)
Commit: | 864e8c4 | |
---|---|---|
Author: | Travis Addair |
WIP: better n
Commit: | ebd8d0d | |
---|---|---|
Author: | Travis Addair | |
Committer: | GitHub |
Fix `frequency_penalty` and `presence_penalty` (#672)
Commit: | 6c5ca67 | |
---|---|---|
Author: | Travis Addair | |
Committer: | GitHub |
Chunked prefill (#653)
Commit: | d1ad0fb | |
---|---|---|
Author: | Magdy Saleh | |
Committer: | GitHub |
Revert "Speed up NER inference (#598)" This reverts commit 19301c744693f4156cca72914c94a389716f2adf.
Commit: | 19301c7 | |
---|---|---|
Author: | Magdy Saleh | |
Committer: | GitHub |
Speed up NER inference (#598)
Commit: | 3a74c25 | |
---|---|---|
Author: | Magdy Saleh | |
Committer: | GitHub |
add new agnostic health endpoint (#588) Co-authored-by: Travis Addair <tgaddair@gmail.com>
Commit: | 20c544f | |
---|---|---|
Author: | Travis Addair | |
Committer: | GitHub |
Add Llava Next (VLM) (#586)
Commit: | 1d2b514 | |
---|---|---|
Author: | Travis Addair | |
Committer: | GitHub |
Add prefix caching (#581)
Commit: | efa4bff | |
---|---|---|
Author: | Magdy Saleh | |
Committer: | GitHub |
Move NER output formatting to server (#561)
Commit: | 07addea | |
---|---|---|
Author: | Travis Addair | |
Committer: | GitHub |
Fix: short circuit download, load, offload for preloaded adapters (#552)
Commit: | 59631a0 | |
---|---|---|
Author: | Travis Addair | |
Committer: | GitHub |
Apply chat template in router to properly validate input length (#538)
Commit: | 452ac73 | |
---|---|---|
Author: | Travis Addair | |
Committer: | GitHub |
Tokenize inputs in router (#548)
Commit: | 5a7a1be | |
---|---|---|
Author: | Travis Addair | |
Committer: | GitHub |
Move kv cache allocation to router to ensure correct block allocation (#545)
Commit: | a3ad209 | |
---|---|---|
Author: | Magdy Saleh | |
Committer: | GitHub |
Lorax NER (#531)
Commit: | e8f3d33 | |
---|---|---|
Author: | Travis Addair | |
Committer: | GitHub |
Add support for batching to embedder models (#503) Co-authored-by: Magdy Saleh <magdy@predibase.com>
Commit: | e37549e | |
---|---|---|
Author: | Magdy Saleh | |
Committer: | GitHub |
Embedder Service v0 with FlashBert (#385)
Commit: | b24dc3e | |
---|---|---|
Author: | Magdy Saleh |
some error handling
Commit: | b79841a | |
---|---|---|
Author: | Magdy Saleh |
merge
Commit: | f66953e | |
---|---|---|
Author: | Magdy Saleh |
v0 of bert inner works
Commit: | 5581ee8 | |
---|---|---|
Author: | Travis Addair | |
Committer: | GitHub |
Added support for Medusa speculative decoding adapters (#372)
Commit: | 56ee215 | |
---|---|---|
Author: | Magdy Saleh |
test
Commit: | 2d9a270 | |
---|---|---|
Author: | Jonas Schroeder | |
Committer: | GitHub |
Add support for returning alternative tokens (#297)
Commit: | 6dea404 | |
---|---|---|
Author: | Travis Addair | |
Committer: | GitHub |
Enforce adapters cannot be loaded past `--adapter-memory-fraction` (#306)
Commit: | e51f078 | |
---|---|---|
Author: | Travis Addair | |
Committer: | GitHub |
Generate to `max_total_tokens` during warmup (#286)
Commit: | 309618c | |
---|---|---|
Author: | Travis Addair | |
Committer: | GitHub |
Added Outlines logits processor for JSON schema validation (#224) Co-authored-by: Jeffrey Tang <jeff@predibase.com>
Commit: | 3b4c973 | |
---|---|---|
Author: | Travis Addair | |
Committer: | GitHub |
Merge multiple LoRA adapters per request (linear, TIES, DARE) (#212)
Commit: | 440a6ef | |
---|---|---|
Author: | Magdy Saleh |
some progress
Commit: | a90d443 | |
---|---|---|
Author: | Travis Addair | |
Committer: | GitHub |
OpenAI v1 Chat Completions API (#171)
Commit: | d88ffed | |
---|---|---|
Author: | Travis Addair | |
Committer: | GitHub |
Added `prompt_tokens` to the response (#165)
Commit: | 143f303 | |
---|---|---|
Author: | Magdy Saleh | |
Committer: | GitHub |
Add predibase as a source for adapters (#125)
Commit: | b8a4032 | |
---|---|---|
Author: | Infernaught | |
Committer: | GitHub |
Rename tgi and text-generation to lorax in rust (#19)
Commit: | 39c1918 | |
---|---|---|
Author: | Infernaught |
Change TextGenerationService to LoraxService
Commit: | c7cfbd9 | |
---|---|---|
Author: | Travis Addair | |
Committer: | GitHub |
Implement continuous multi-adapter batching (#2)
Commit: | 3b9fdfd | |
---|---|---|
Author: | Geoffrey Angus | |
Committer: | Travis Addair |
Enable Mistral (#13) * get flash attn v2 working with llama-2 * first working mistral e2e * warnings about adapter loading * add window size to router * DAL working * pr revision
Commit: | ef82137 | |
---|---|---|
Author: | Geoffrey Angus | |
Committer: | Travis Addair |
make dynamic adapter loading compatible with s3 change; todo: verify static adapter loading
Commit: | e05e0ed | |
---|---|---|
Author: | Geoffrey Angus | |
Committer: | Travis Addair |
add error handling for bad download_weight calls
Commit: | 9d3e63f | |
---|---|---|
Author: | Geoffrey Angus | |
Committer: | Travis Addair |
wip: adds adapter hot-loading in Python server; TODO: add queue changeover logic
Commit: | fe80f53 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(server): auto max_batch_total_tokens for flash att models (#630)
Commit: | e74bd41 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(server): add paged attention to flash models (#516) Closes #478
Commit: | 895c5f1 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(server): only compute prefill logprobs when asked (#406) Close #288
Commit: | 218c9ad | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat: decrease IPC proto size (#367) Closes #307 #308
Commit: | db2b4e0 | |
---|---|---|
Author: | Nicolas Patry | |
Committer: | GitHub |
feat(router): new healthcheck that skips the queue (#244) Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com> Co-authored-by: OlivierDehaene <olivier@huggingface.co>
Commit: | ebc74d5 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(router): use number of tokens in batch as input for dynamic batching (#226) Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Commit: | 343437c | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(router): add device and dtype info (#215)
Commit: | 9987960 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(router): make router input validation optional (#164)
Commit: | 610bb1f | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(benchmark): tui based benchmarking tool (#149)
Commit: | f000068 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(server): clear cache on error (#143)
Commit: | b49dbf2 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
fix(server): use server tokenizer as gt (#128)
Commit: | 1a2d682 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat: support typical sampling (#114) closes #112
Commit: | 9b8ea6a | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(server): add logits watermark (#90)
Commit: | 0ac184c | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(server): add special token bool (#85)
Commit: | 20c3c59 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(router): refactor API and add openAPI schemas (#53)
Commit: | 313194f | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(server): support repetition penalty (#47)
Commit: | 017a2a8 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat: Add token streaming using ServerSideEvents support (#41)
Commit: | 54fec93 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
fix(server): fix seeding with multiple shards (#44)
Commit: | 4f9ac67 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
Revert "feat: Add token streaming using ServerSideEvents support" (#40) Reverts huggingface/text-generation-inference#36
Commit: | 7fbfbb0 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat: Add token streaming using ServerSideEvents support (#36) Add token streaming using ServerSideEvents (SSE). The signature of the SSE events is: ```rust struct Details { finish_reason: String, generated_tokens: u32, seed: Option<u64>, } struct StreamResponse { token: Token, generated_text: Option<String>, details: Option<Details>, } struct ErrorResponse { error: String, } ```
Commit: | cd298bc | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat: Support sampling seeding (#37) Co-authored-by: Yannic Kilcher <yk@users.noreply.github.com>
Commit: | 32a2530 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat: Return logprobs (#8)
Commit: | 718096f | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat: Support stop sequences (#7)
Commit: | 427d7cc | |
---|---|---|
Author: | OlivierDehaene |
feat(server): Support AutoModelForSeq2SeqLM
Commit: | c5665f5 | |
---|---|---|
Author: | OlivierDehaene |
feat(server): Support generic AutoModelForCausalLM
Commit: | f16f2f5 | |
---|---|---|
Author: | Olivier Dehaene | |
Committer: | OlivierDehaene |
v0.1.0
Commit: | 4c693e6 | |
---|---|---|
Author: | Olivier Dehaene |
Refactored gRPC interface Added validation logic
Commit: | 295831a | |
---|---|---|
Author: | Olivier Dehaene |
Init