Proto commits in predibase/lorax

These 68 commits are when the Protocol Buffers files have changed:

Commit:abf9f39
Author:Magdy Saleh
Committer:GitHub

Fix stella model and client <> embedding compatibility (#717)

The documentation is generated from this commit.

Commit:2af302d
Author:Ajinkya Tejankar
Committer:GitHub

Fix decoder_input_details bug (#705)

Commit:0204c2e
Author:Travis Addair
Committer:GitHub

Record number of skipped tokens in the response (#681)

Commit:864e8c4
Author:Travis Addair

WIP: better n

Commit:ebd8d0d
Author:Travis Addair
Committer:GitHub

Fix `frequency_penalty` and `presence_penalty` (#672)

Commit:6c5ca67
Author:Travis Addair
Committer:GitHub

Chunked prefill (#653)

Commit:d1ad0fb
Author:Magdy Saleh
Committer:GitHub

Revert "Speed up NER inference (#598)" This reverts commit 19301c744693f4156cca72914c94a389716f2adf.

Commit:19301c7
Author:Magdy Saleh
Committer:GitHub

Speed up NER inference (#598)

Commit:3a74c25
Author:Magdy Saleh
Committer:GitHub

add new agnostic health endpoint (#588) Co-authored-by: Travis Addair <tgaddair@gmail.com>

Commit:20c544f
Author:Travis Addair
Committer:GitHub

Add Llava Next (VLM) (#586)

Commit:1d2b514
Author:Travis Addair
Committer:GitHub

Add prefix caching (#581)

Commit:efa4bff
Author:Magdy Saleh
Committer:GitHub

Move NER output formatting to server (#561)

Commit:07addea
Author:Travis Addair
Committer:GitHub

Fix: short circuit download, load, offload for preloaded adapters (#552)

Commit:59631a0
Author:Travis Addair
Committer:GitHub

Apply chat template in router to properly validate input length (#538)

Commit:452ac73
Author:Travis Addair
Committer:GitHub

Tokenize inputs in router (#548)

Commit:5a7a1be
Author:Travis Addair
Committer:GitHub

Move kv cache allocation to router to ensure correct block allocation (#545)

Commit:a3ad209
Author:Magdy Saleh
Committer:GitHub

Lorax NER (#531)

Commit:e8f3d33
Author:Travis Addair
Committer:GitHub

Add support for batching to embedder models (#503) Co-authored-by: Magdy Saleh <magdy@predibase.com>

Commit:e37549e
Author:Magdy Saleh
Committer:GitHub

Embedder Service v0 with FlashBert (#385)

Commit:b24dc3e
Author:Magdy Saleh

some error handling

Commit:b79841a
Author:Magdy Saleh

merge

Commit:f66953e
Author:Magdy Saleh

v0 of bert inner works

Commit:5581ee8
Author:Travis Addair
Committer:GitHub

Added support for Medusa speculative decoding adapters (#372)

Commit:56ee215
Author:Magdy Saleh

test

Commit:2d9a270
Author:Jonas Schroeder
Committer:GitHub

Add support for returning alternative tokens (#297)

Commit:6dea404
Author:Travis Addair
Committer:GitHub

Enforce adapters cannot be loaded past `--adapter-memory-fraction` (#306)

Commit:e51f078
Author:Travis Addair
Committer:GitHub

Generate to `max_total_tokens` during warmup (#286)

Commit:309618c
Author:Travis Addair
Committer:GitHub

Added Outlines logits processor for JSON schema validation (#224) Co-authored-by: Jeffrey Tang <jeff@predibase.com>

Commit:3b4c973
Author:Travis Addair
Committer:GitHub

Merge multiple LoRA adapters per request (linear, TIES, DARE) (#212)

Commit:440a6ef
Author:Magdy Saleh

some progress

Commit:a90d443
Author:Travis Addair
Committer:GitHub

OpenAI v1 Chat Completions API (#171)

Commit:d88ffed
Author:Travis Addair
Committer:GitHub

Added `prompt_tokens` to the response (#165)

Commit:143f303
Author:Magdy Saleh
Committer:GitHub

Add predibase as a source for adapters (#125)

Commit:b8a4032
Author:Infernaught
Committer:GitHub

Rename tgi and text-generation to lorax in rust (#19)

Commit:39c1918
Author:Infernaught

Change TextGenerationService to LoraxService

Commit:c7cfbd9
Author:Travis Addair
Committer:GitHub

Implement continuous multi-adapter batching (#2)

Commit:3b9fdfd
Author:Geoffrey Angus
Committer:Travis Addair

Enable Mistral (#13) * get flash attn v2 working with llama-2 * first working mistral e2e * warnings about adapter loading * add window size to router * DAL working * pr revision

Commit:ef82137
Author:Geoffrey Angus
Committer:Travis Addair

make dynamic adapter loading compatible with s3 change; todo: verify static adapter loading

Commit:e05e0ed
Author:Geoffrey Angus
Committer:Travis Addair

add error handling for bad download_weight calls

Commit:9d3e63f
Author:Geoffrey Angus
Committer:Travis Addair

wip: adds adapter hot-loading in Python server; TODO: add queue changeover logic

Commit:fe80f53
Author:OlivierDehaene
Committer:GitHub

feat(server): auto max_batch_total_tokens for flash att models (#630)

Commit:e74bd41
Author:OlivierDehaene
Committer:GitHub

feat(server): add paged attention to flash models (#516) Closes #478

Commit:895c5f1
Author:OlivierDehaene
Committer:GitHub

feat(server): only compute prefill logprobs when asked (#406) Close #288

Commit:218c9ad
Author:OlivierDehaene
Committer:GitHub

feat: decrease IPC proto size (#367) Closes #307 #308

Commit:db2b4e0
Author:Nicolas Patry
Committer:GitHub

feat(router): new healthcheck that skips the queue (#244) Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com> Co-authored-by: OlivierDehaene <olivier@huggingface.co>

Commit:ebc74d5
Author:OlivierDehaene
Committer:GitHub

feat(router): use number of tokens in batch as input for dynamic batching (#226) Co-authored-by: Nick Hill <nickhill@us.ibm.com>

Commit:343437c
Author:OlivierDehaene
Committer:GitHub

feat(router): add device and dtype info (#215)

Commit:9987960
Author:OlivierDehaene
Committer:GitHub

feat(router): make router input validation optional (#164)

Commit:610bb1f
Author:OlivierDehaene
Committer:GitHub

feat(benchmark): tui based benchmarking tool (#149)

Commit:f000068
Author:OlivierDehaene
Committer:GitHub

feat(server): clear cache on error (#143)

Commit:b49dbf2
Author:OlivierDehaene
Committer:GitHub

fix(server): use server tokenizer as gt (#128)

Commit:1a2d682
Author:OlivierDehaene
Committer:GitHub

feat: support typical sampling (#114) closes #112

Commit:9b8ea6a
Author:OlivierDehaene
Committer:GitHub

feat(server): add logits watermark (#90)

Commit:0ac184c
Author:OlivierDehaene
Committer:GitHub

feat(server): add special token bool (#85)

Commit:20c3c59
Author:OlivierDehaene
Committer:GitHub

feat(router): refactor API and add openAPI schemas (#53)

Commit:313194f
Author:OlivierDehaene
Committer:GitHub

feat(server): support repetition penalty (#47)

Commit:017a2a8
Author:OlivierDehaene
Committer:GitHub

feat: Add token streaming using ServerSideEvents support (#41)

Commit:54fec93
Author:OlivierDehaene
Committer:GitHub

fix(server): fix seeding with multiple shards (#44)

Commit:4f9ac67
Author:OlivierDehaene
Committer:GitHub

Revert "feat: Add token streaming using ServerSideEvents support" (#40) Reverts huggingface/text-generation-inference#36

Commit:7fbfbb0
Author:OlivierDehaene
Committer:GitHub

feat: Add token streaming using ServerSideEvents support (#36) Add token streaming using ServerSideEvents (SSE). The signature of the SSE events is: ```rust struct Details { finish_reason: String, generated_tokens: u32, seed: Option<u64>, } struct StreamResponse { token: Token, generated_text: Option<String>, details: Option<Details>, } struct ErrorResponse { error: String, } ```

Commit:cd298bc
Author:OlivierDehaene
Committer:GitHub

feat: Support sampling seeding (#37) Co-authored-by: Yannic Kilcher <yk@users.noreply.github.com>

Commit:32a2530
Author:OlivierDehaene
Committer:GitHub

feat: Return logprobs (#8)

Commit:718096f
Author:OlivierDehaene
Committer:GitHub

feat: Support stop sequences (#7)

Commit:427d7cc
Author:OlivierDehaene

feat(server): Support AutoModelForSeq2SeqLM

Commit:c5665f5
Author:OlivierDehaene

feat(server): Support generic AutoModelForCausalLM

Commit:f16f2f5
Author:Olivier Dehaene
Committer:OlivierDehaene

v0.1.0

Commit:4c693e6
Author:Olivier Dehaene

Refactored gRPC interface Added validation logic

Commit:295831a
Author:Olivier Dehaene

Init