Proto commits in predibase/lorax

These 68 commits are when the Protocol Buffers files have changed:

2024-12-15

Commit:	abf9f39
Author:	Magdy Saleh	2024-12-15 14:28:50 -0500
Committer:	GitHub	2024-12-15 14:28:50 -0500

Fix stella model and client <> embedding compatibility (#717)

The documentation is generated from this commit.

2024-12-10

Commit:	2af302d
Author:	Ajinkya Tejankar	2024-12-10 07:05:00 +0530
Committer:	GitHub	2024-12-10 07:05:00 +0530

Fix decoder_input_details bug (#705)

2024-11-14

Commit:	0204c2e
Author:	Travis Addair	2024-11-14 12:54:24 -0800
Committer:	GitHub	2024-11-14 12:54:24 -0800

Record number of skipped tokens in the response (#681)

2024-11-12

Commit:	864e8c4
Author:	Travis Addair	2024-11-12 09:18:33 -0800

WIP: better n

2024-11-11

Commit:	ebd8d0d
Author:	Travis Addair	2024-11-11 12:27:34 -0800
Committer:	GitHub	2024-11-11 12:27:34 -0800

Fix `frequency_penalty` and `presence_penalty` (#672)

2024-10-21

Commit:	6c5ca67
Author:	Travis Addair	2024-10-21 12:04:56 -0700
Committer:	GitHub	2024-10-21 12:04:56 -0700

Chunked prefill (#653)

2024-09-10

Commit:	d1ad0fb
Author:	Magdy Saleh	2024-09-10 12:30:01 -0400
Committer:	GitHub	2024-09-10 12:30:01 -0400

Revert "Speed up NER inference (#598)" This reverts commit 19301c744693f4156cca72914c94a389716f2adf.

Commit:	19301c7
Author:	Magdy Saleh	2024-09-10 12:29:58 -0400
Committer:	GitHub	2024-09-10 12:29:58 -0400

Speed up NER inference (#598)

2024-08-26

Commit:	3a74c25
Author:	Magdy Saleh	2024-08-26 16:05:54 -0400
Committer:	GitHub	2024-08-26 16:05:54 -0400

add new agnostic health endpoint (#588) Co-authored-by: Travis Addair <tgaddair@gmail.com>

Commit:	20c544f
Author:	Travis Addair	2024-08-26 09:51:42 -0700
Committer:	GitHub	2024-08-26 09:51:42 -0700

Add Llava Next (VLM) (#586)

2024-08-21

Commit:	1d2b514
Author:	Travis Addair	2024-08-21 14:34:12 -0700
Committer:	GitHub	2024-08-21 14:34:12 -0700

Add prefix caching (#581)

2024-07-31

Commit:	efa4bff
Author:	Magdy Saleh	2024-07-31 06:42:47 +0100
Committer:	GitHub	2024-07-31 06:42:47 +0100

Move NER output formatting to server (#561)

2024-07-23

Commit:	07addea
Author:	Travis Addair	2024-07-23 15:47:59 -0700
Committer:	GitHub	2024-07-23 15:47:59 -0700

Fix: short circuit download, load, offload for preloaded adapters (#552)

Commit:	59631a0
Author:	Travis Addair	2024-07-23 10:55:46 -0700
Committer:	GitHub	2024-07-23 10:55:46 -0700

Apply chat template in router to properly validate input length (#538)

2024-07-19

Commit:	452ac73
Author:	Travis Addair	2024-07-19 14:51:27 -0700
Committer:	GitHub	2024-07-19 14:51:27 -0700

Tokenize inputs in router (#548)

Commit:	5a7a1be
Author:	Travis Addair	2024-07-19 10:10:00 -0700
Committer:	GitHub	2024-07-19 10:10:00 -0700

Move kv cache allocation to router to ensure correct block allocation (#545)

2024-07-09

Commit:	a3ad209
Author:	Magdy Saleh	2024-07-09 09:14:50 -0400
Committer:	GitHub	2024-07-09 09:14:50 -0400

Lorax NER (#531)

2024-06-08

Commit:	e8f3d33
Author:	Travis Addair	2024-06-07 22:34:01 -0700
Committer:	GitHub	2024-06-07 22:34:01 -0700

Add support for batching to embedder models (#503) Co-authored-by: Magdy Saleh <magdy@predibase.com>

2024-05-25

Commit:	e37549e
Author:	Magdy Saleh	2024-05-25 17:15:05 -0400
Committer:	GitHub	2024-05-25 14:15:05 -0700

Embedder Service v0 with FlashBert (#385)

2024-04-12

Commit:	b24dc3e
Author:	Magdy Saleh	2024-04-12 19:38:51 +0000

some error handling

2024-04-04

Commit:	b79841a
Author:	Magdy Saleh	2024-04-04 12:33:22 -0400

merge

Commit:	f66953e
Author:	Magdy Saleh	2024-04-04 16:24:55 +0000

v0 of bert inner works

2024-04-03

Commit:	5581ee8
Author:	Travis Addair	2024-04-03 10:27:00 -0700
Committer:	GitHub	2024-04-03 10:27:00 -0700

Added support for Medusa speculative decoding adapters (#372)

2024-04-02

Commit:	56ee215
Author:	Magdy Saleh	2024-04-02 17:08:44 -0400

test

2024-03-07

Commit:	2d9a270
Author:	Jonas Schroeder	2024-03-07 19:39:00 +0100
Committer:	GitHub	2024-03-07 10:39:00 -0800

Add support for returning alternative tokens (#297)

Commit:	6dea404
Author:	Travis Addair	2024-03-06 16:14:36 -0800
Committer:	GitHub	2024-03-06 16:14:36 -0800

Enforce adapters cannot be loaded past `--adapter-memory-fraction` (#306)

2024-02-28

Commit:	e51f078
Author:	Travis Addair	2024-02-27 17:00:29 -0800
Committer:	GitHub	2024-02-27 17:00:29 -0800

Generate to `max_total_tokens` during warmup (#286)

2024-02-12

Commit:	309618c
Author:	Travis Addair	2024-02-12 15:38:08 -0800
Committer:	GitHub	2024-02-12 15:38:08 -0800

Added Outlines logits processor for JSON schema validation (#224) Co-authored-by: Jeffrey Tang <jeff@predibase.com>

2024-02-01

Commit:	3b4c973
Author:	Travis Addair	2024-01-31 21:39:34 -0800
Committer:	GitHub	2024-01-31 21:39:34 -0800

Merge multiple LoRA adapters per request (linear, TIES, DARE) (#212)

2024-01-22

Commit:	440a6ef
Author:	Magdy Saleh	2024-01-22 13:35:33 -0500

some progress

2024-01-10

Commit:	a90d443
Author:	Travis Addair	2024-01-10 10:35:46 -0800
Committer:	GitHub	2024-01-10 10:35:46 -0800

OpenAI v1 Chat Completions API (#171)

2024-01-09

Commit:	d88ffed
Author:	Travis Addair	2024-01-08 21:54:24 -0800
Committer:	GitHub	2024-01-08 21:54:24 -0800

Added `prompt_tokens` to the response (#165)

2023-12-14

Commit:	143f303
Author:	Magdy Saleh	2023-12-14 18:55:51 -0500
Committer:	GitHub	2023-12-14 18:55:51 -0500

Add predibase as a source for adapters (#125)

2023-11-16

Commit:	b8a4032
Author:	Infernaught	2023-11-15 18:20:58 -0700
Committer:	GitHub	2023-11-15 18:20:58 -0700

Rename tgi and text-generation to lorax in rust (#19)

Commit:	39c1918
Author:	Infernaught	2023-11-15 17:09:12 -0700

Change TextGenerationService to LoraxService

2023-11-09

Commit:	c7cfbd9
Author:	Travis Addair	2023-11-09 13:17:51 -0800
Committer:	GitHub	2023-11-09 13:17:51 -0800

Implement continuous multi-adapter batching (#2)

2023-10-20

Commit:	3b9fdfd
Author:	Geoffrey Angus	2023-10-16 09:52:57 -0700
Committer:	Travis Addair	2023-10-20 11:05:15 -0700

Enable Mistral (#13) * get flash attn v2 working with llama-2 * first working mistral e2e * warnings about adapter loading * add window size to router * DAL working * pr revision

Commit:	ef82137
Author:	Geoffrey Angus	2023-09-15 16:08:12 -0700
Committer:	Travis Addair	2023-10-20 11:03:43 -0700

make dynamic adapter loading compatible with s3 change; todo: verify static adapter loading

Commit:	e05e0ed
Author:	Geoffrey Angus	2023-09-11 19:47:33 -0700
Committer:	Travis Addair	2023-10-20 10:56:26 -0700

add error handling for bad download_weight calls

Commit:	9d3e63f
Author:	Geoffrey Angus	2023-09-08 09:36:23 -0700
Committer:	Travis Addair	2023-10-20 10:54:23 -0700

wip: adds adapter hot-loading in Python server; TODO: add queue changeover logic

2023-07-19

Commit:	fe80f53
Author:	OlivierDehaene	2023-07-19 09:31:25 +0200
Committer:	GitHub	2023-07-19 09:31:25 +0200

feat(server): auto max_batch_total_tokens for flash att models (#630)

2023-06-30

Commit:	e74bd41
Author:	OlivierDehaene	2023-06-30 19:09:59 +0200
Committer:	GitHub	2023-06-30 19:09:59 +0200

feat(server): add paged attention to flash models (#516) Closes #478

2023-06-02

Commit:	895c5f1
Author:	OlivierDehaene	2023-06-02 17:12:30 +0200
Committer:	GitHub	2023-06-02 17:12:30 +0200

feat(server): only compute prefill logprobs when asked (#406) Close #288

2023-05-24

Commit:	218c9ad
Author:	OlivierDehaene	2023-05-24 19:19:57 +0200
Committer:	GitHub	2023-05-24 19:19:57 +0200

feat: decrease IPC proto size (#367) Closes #307 #308

2023-04-26

Commit:	db2b4e0
Author:	Nicolas Patry	2023-04-26 20:23:54 +0200
Committer:	GitHub	2023-04-26 20:23:54 +0200

feat(router): new healthcheck that skips the queue (#244) Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com> Co-authored-by: OlivierDehaene <olivier@huggingface.co>

2023-04-24

Commit:	ebc74d5
Author:	OlivierDehaene	2023-04-24 17:59:00 +0200
Committer:	GitHub	2023-04-24 17:59:00 +0200

feat(router): use number of tokens in batch as input for dynamic batching (#226) Co-authored-by: Nick Hill <nickhill@us.ibm.com>

2023-04-21

Commit:	343437c
Author:	OlivierDehaene	2023-04-21 15:36:29 +0200
Committer:	GitHub	2023-04-21 15:36:29 +0200

feat(router): add device and dtype info (#215)

2023-04-09

Commit:	9987960
Author:	OlivierDehaene	2023-04-09 20:22:27 +0200
Committer:	GitHub	2023-04-09 20:22:27 +0200

feat(router): make router input validation optional (#164)

2023-03-30

Commit:	610bb1f
Author:	OlivierDehaene	2023-03-30 15:26:27 +0200
Committer:	GitHub	2023-03-30 15:26:27 +0200

feat(benchmark): tui based benchmarking tool (#149)

2023-03-28

Commit:	f000068
Author:	OlivierDehaene	2023-03-28 11:29:35 +0200
Committer:	GitHub	2023-03-28 11:29:35 +0200

feat(server): clear cache on error (#143)

2023-03-16

Commit:	b49dbf2
Author:	OlivierDehaene	2023-03-16 12:12:26 +0100
Committer:	GitHub	2023-03-16 12:12:26 +0100

fix(server): use server tokenizer as gt (#128)

2023-03-09

Commit:	1a2d682
Author:	OlivierDehaene	2023-03-09 11:33:57 +0100
Committer:	GitHub	2023-03-09 11:33:57 +0100

feat: support typical sampling (#114) closes #112

2023-03-02

Commit:	9b8ea6a
Author:	OlivierDehaene	2023-03-02 12:30:41 +0100
Committer:	GitHub	2023-03-02 12:30:41 +0100

feat(server): add logits watermark (#90)

2023-02-24

Commit:	0ac184c
Author:	OlivierDehaene	2023-02-24 15:55:57 +0100
Committer:	GitHub	2023-02-24 15:55:57 +0100

feat(server): add special token bool (#85)

2023-02-03

Commit:	20c3c59
Author:	OlivierDehaene	2023-02-03 12:43:37 +0100
Committer:	GitHub	2023-02-03 12:43:37 +0100

feat(router): refactor API and add openAPI schemas (#53)

2023-02-01

Commit:	313194f
Author:	OlivierDehaene	2023-02-01 15:58:42 +0100
Committer:	GitHub	2023-02-01 15:58:42 +0100

feat(server): support repetition penalty (#47)

2023-01-31

Commit:	017a2a8
Author:	OlivierDehaene	2023-01-31 17:04:00 +0100
Committer:	GitHub	2023-01-31 17:04:00 +0100

feat: Add token streaming using ServerSideEvents support (#41)

Commit:	54fec93
Author:	OlivierDehaene	2023-01-31 16:01:15 +0100
Committer:	GitHub	2023-01-31 16:01:15 +0100

fix(server): fix seeding with multiple shards (#44)

Commit:	4f9ac67
Author:	OlivierDehaene	2023-01-31 14:21:51 +0100
Committer:	GitHub	2023-01-31 14:21:51 +0100

Revert "feat: Add token streaming using ServerSideEvents support" (#40) Reverts huggingface/text-generation-inference#36

Commit:	7fbfbb0
Author:	OlivierDehaene	2023-01-31 11:49:43 +0100
Committer:	GitHub	2023-01-31 11:49:43 +0100

feat: Add token streaming using ServerSideEvents support (#36) Add token streaming using ServerSideEvents (SSE). The signature of the SSE events is: ```rust struct Details { finish_reason: String, generated_tokens: u32, seed: Option<u64>, } struct StreamResponse { token: Token, generated_text: Option<String>, details: Option<Details>, } struct ErrorResponse { error: String, } ```

2023-01-30

Commit:	cd298bc
Author:	OlivierDehaene	2023-01-30 15:36:16 +0100
Committer:	GitHub	2023-01-30 15:36:16 +0100

feat: Support sampling seeding (#37) Co-authored-by: Yannic Kilcher <yk@users.noreply.github.com>

2022-12-15

Commit:	32a2530
Author:	OlivierDehaene	2022-12-15 17:03:56 +0100
Committer:	GitHub	2022-12-15 17:03:56 +0100

feat: Return logprobs (#8)

2022-12-12

Commit:	718096f
Author:	OlivierDehaene	2022-12-12 18:25:22 +0100
Committer:	GitHub	2022-12-12 18:25:22 +0100

feat: Support stop sequences (#7)

2022-11-04

Commit:	427d7cc
Author:	OlivierDehaene	2022-11-04 18:03:04 +0100

feat(server): Support AutoModelForSeq2SeqLM

Commit:	c5665f5
Author:	OlivierDehaene	2022-11-04 14:22:47 +0100

feat(server): Support generic AutoModelForCausalLM

2022-10-20

Commit:	f16f2f5
Author:	Olivier Dehaene	2022-10-18 15:19:03 +0200
Committer:	OlivierDehaene	2022-10-20 19:14:44 +0200

v0.1.0

2022-10-11

Commit:	4c693e6
Author:	Olivier Dehaene	2022-10-11 16:50:54 +0200

Refactored gRPC interface Added validation logic

2022-10-08

Commit:	295831a
Author:	Olivier Dehaene	2022-10-08 12:30:12 +0200

Init