Proto commits in Preemo-Inc/text-generation-inference

These 28 commits are when the Protocol Buffers files have changed:

Commit:fe80f53
Author:OlivierDehaene
Committer:GitHub

feat(server): auto max_batch_total_tokens for flash att models (#630)

The documentation is generated from this commit.

Commit:e74bd41
Author:OlivierDehaene
Committer:GitHub

feat(server): add paged attention to flash models (#516) Closes #478

Commit:895c5f1
Author:OlivierDehaene
Committer:GitHub

feat(server): only compute prefill logprobs when asked (#406) Close #288

Commit:218c9ad
Author:OlivierDehaene
Committer:GitHub

feat: decrease IPC proto size (#367) Closes #307 #308

Commit:db2b4e0
Author:Nicolas Patry
Committer:GitHub

feat(router): new healthcheck that skips the queue (#244) Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com> Co-authored-by: OlivierDehaene <olivier@huggingface.co>

Commit:ebc74d5
Author:OlivierDehaene
Committer:GitHub

feat(router): use number of tokens in batch as input for dynamic batching (#226) Co-authored-by: Nick Hill <nickhill@us.ibm.com>

Commit:343437c
Author:OlivierDehaene
Committer:GitHub

feat(router): add device and dtype info (#215)

Commit:9987960
Author:OlivierDehaene
Committer:GitHub

feat(router): make router input validation optional (#164)

Commit:610bb1f
Author:OlivierDehaene
Committer:GitHub

feat(benchmark): tui based benchmarking tool (#149)

Commit:f000068
Author:OlivierDehaene
Committer:GitHub

feat(server): clear cache on error (#143)

Commit:b49dbf2
Author:OlivierDehaene
Committer:GitHub

fix(server): use server tokenizer as gt (#128)

Commit:1a2d682
Author:OlivierDehaene
Committer:GitHub

feat: support typical sampling (#114) closes #112

Commit:9b8ea6a
Author:OlivierDehaene
Committer:GitHub

feat(server): add logits watermark (#90)

Commit:0ac184c
Author:OlivierDehaene
Committer:GitHub

feat(server): add special token bool (#85)

Commit:20c3c59
Author:OlivierDehaene
Committer:GitHub

feat(router): refactor API and add openAPI schemas (#53)

Commit:313194f
Author:OlivierDehaene
Committer:GitHub

feat(server): support repetition penalty (#47)

Commit:017a2a8
Author:OlivierDehaene
Committer:GitHub

feat: Add token streaming using ServerSideEvents support (#41)

Commit:54fec93
Author:OlivierDehaene
Committer:GitHub

fix(server): fix seeding with multiple shards (#44)

Commit:4f9ac67
Author:OlivierDehaene
Committer:GitHub

Revert "feat: Add token streaming using ServerSideEvents support" (#40) Reverts huggingface/text-generation-inference#36

Commit:7fbfbb0
Author:OlivierDehaene
Committer:GitHub

feat: Add token streaming using ServerSideEvents support (#36) Add token streaming using ServerSideEvents (SSE). The signature of the SSE events is: ```rust struct Details { finish_reason: String, generated_tokens: u32, seed: Option<u64>, } struct StreamResponse { token: Token, generated_text: Option<String>, details: Option<Details>, } struct ErrorResponse { error: String, } ```

Commit:cd298bc
Author:OlivierDehaene
Committer:GitHub

feat: Support sampling seeding (#37) Co-authored-by: Yannic Kilcher <yk@users.noreply.github.com>

Commit:32a2530
Author:OlivierDehaene
Committer:GitHub

feat: Return logprobs (#8)

Commit:718096f
Author:OlivierDehaene
Committer:GitHub

feat: Support stop sequences (#7)

Commit:427d7cc
Author:OlivierDehaene

feat(server): Support AutoModelForSeq2SeqLM

Commit:c5665f5
Author:OlivierDehaene

feat(server): Support generic AutoModelForCausalLM

Commit:f16f2f5
Author:Olivier Dehaene
Committer:OlivierDehaene

v0.1.0

Commit:4c693e6
Author:Olivier Dehaene

Refactored gRPC interface Added validation logic

Commit:295831a
Author:Olivier Dehaene

Init