These 28 commits are when the Protocol Buffers files have changed:
Commit: | fe80f53 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(server): auto max_batch_total_tokens for flash att models (#630)
The documentation is generated from this commit.
Commit: | e74bd41 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(server): add paged attention to flash models (#516) Closes #478
Commit: | 895c5f1 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(server): only compute prefill logprobs when asked (#406) Close #288
Commit: | 218c9ad | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat: decrease IPC proto size (#367) Closes #307 #308
Commit: | db2b4e0 | |
---|---|---|
Author: | Nicolas Patry | |
Committer: | GitHub |
feat(router): new healthcheck that skips the queue (#244) Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com> Co-authored-by: OlivierDehaene <olivier@huggingface.co>
Commit: | ebc74d5 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(router): use number of tokens in batch as input for dynamic batching (#226) Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Commit: | 343437c | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(router): add device and dtype info (#215)
Commit: | 9987960 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(router): make router input validation optional (#164)
Commit: | 610bb1f | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(benchmark): tui based benchmarking tool (#149)
Commit: | f000068 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(server): clear cache on error (#143)
Commit: | b49dbf2 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
fix(server): use server tokenizer as gt (#128)
Commit: | 1a2d682 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat: support typical sampling (#114) closes #112
Commit: | 9b8ea6a | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(server): add logits watermark (#90)
Commit: | 0ac184c | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(server): add special token bool (#85)
Commit: | 20c3c59 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(router): refactor API and add openAPI schemas (#53)
Commit: | 313194f | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat(server): support repetition penalty (#47)
Commit: | 017a2a8 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat: Add token streaming using ServerSideEvents support (#41)
Commit: | 54fec93 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
fix(server): fix seeding with multiple shards (#44)
Commit: | 4f9ac67 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
Revert "feat: Add token streaming using ServerSideEvents support" (#40) Reverts huggingface/text-generation-inference#36
Commit: | 7fbfbb0 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat: Add token streaming using ServerSideEvents support (#36) Add token streaming using ServerSideEvents (SSE). The signature of the SSE events is: ```rust struct Details { finish_reason: String, generated_tokens: u32, seed: Option<u64>, } struct StreamResponse { token: Token, generated_text: Option<String>, details: Option<Details>, } struct ErrorResponse { error: String, } ```
Commit: | cd298bc | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat: Support sampling seeding (#37) Co-authored-by: Yannic Kilcher <yk@users.noreply.github.com>
Commit: | 32a2530 | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat: Return logprobs (#8)
Commit: | 718096f | |
---|---|---|
Author: | OlivierDehaene | |
Committer: | GitHub |
feat: Support stop sequences (#7)
Commit: | 427d7cc | |
---|---|---|
Author: | OlivierDehaene |
feat(server): Support AutoModelForSeq2SeqLM
Commit: | c5665f5 | |
---|---|---|
Author: | OlivierDehaene |
feat(server): Support generic AutoModelForCausalLM
Commit: | f16f2f5 | |
---|---|---|
Author: | Olivier Dehaene | |
Committer: | OlivierDehaene |
v0.1.0
Commit: | 4c693e6 | |
---|---|---|
Author: | Olivier Dehaene |
Refactored gRPC interface Added validation logic
Commit: | 295831a | |
---|---|---|
Author: | Olivier Dehaene |
Init